NO.    196058 


NEW  YORK  UNIVERSITY 
INSTITUTE  OF  M  ATI ! EM ATiCAL  SCIENCES 

25  Waverly  Pljce,  New  York  3,  N.  Y. 


NEW   YORK   UNIVERSITY 
INSTITUTE    OF 
MATHEMATICAL   SCIENCES 


IMM-NYU   267 
MAY    I960 


0EC2tt  oO 


On  The  Foundations  of  Statistical  Inference 
I.  Binary  Experiments 


ALLAN  BIRNBAUM 


<o 


PREPARED  UNDER 

CONTRACT  NO.  NONR-285(38) 

WITH  THE  OFFICE  OF  NAVAL  RESEARCH 

UNITED  STATES  NAVY 


REPRODUCTION  IN  WHOLE  OR  IN  P'^'^T 

IS  PERMITTED  FOR  ANY  PUKi^Giii 
OF  THE  UNITED  STATES  GOVERNMENT. 


No,  196058  IMM-NYU  26? 

May  I960 


New  York  University 
Institute  of  Mathematical  Sciences 


ON  THE  FOUNDATIONS  OP  STATISTICAL  INFERENCE 
I„  BINARY  EXPERIMENTS 

Allan  Birnbavun 


Prepared  lander  the  sponsorship  of  the  Office  of  Naval 
Research,  United  States  Navy,  Contract  No,  Nonr-285(38). 


New  York  I960 


COWTEiOTS 
0«      Introduction  and   summary   • 


A«        tfethematical   developments 

1,  The   canonical   form  of  a  binary.  e;xperijTient.     •    •   .«    •    •  1 

2,  Simple  binary  experiments   ••••ao*Aa«*.**  7 

3,  The  partial  ordering  of  simple  binary  experiments..  ,.  13 
l|,o  riixtures  of  simple  binary  experiinents  »  e  •  •  •  •  •  l5 
5.«  Decomposition  theorem  for  binary  experiments  •  «  e  •  21 

6,  The  partial  ordering  of  binary  experiments  •  •  •  e  •  26 

B,  Inference  methods  with  probabilistic  justifications 

7,  On  the  mathematical  treatment'  of  statistical 
inference  problems  eo9«*««**e«o»***«   28 

8,  Two-decision  problems j  tests  of-  statistical 
hypotheses  ••••o«««&e«*«e«a«*«««   31 

9,  Multi-decision  problems;  tests  based  on  critical 

levels  ••«eoo**««««o«o«e«»*o««    33 

C,  Inference  methods  with  intrinsic  justifications 

10,,  Evidential  interpretations  of  outcomes  .«•••••  •  38 

11.,  Symmetric  simple  binary  experiments   ••••••••  I4.O 

12,  Symmetric  binary  experiments    ••.«•••••••••  I4.3 

13  0  Binary  experiments   in  general      ••••••••••.•  [4.6 

li^.  Inferences  based  on  the   likelihood  function     •    •    •    ,  I4.7 

l5»  Appraisal  and  design  of.  exper.im.en,ts   for 

informative    inference      ao,a,oea,,*,,o*  50 

16,      Relations  between  statistical'  evidence   and 

significance   tests    s«<io«*oeo,«9»e««,      58 

D,  Discussion 

17o   Relations  of  statistical  evidence'  to  prior 

information  and  to  conclusions    ••••••••••.•      61 

180      References  and  acknowledgments    ••••••••••«      62 

Appendix   (Supplement   to   Part  A):      T'he'  al'.^ebra   of 

statistical  experiments      «o»oo****»*«**      (^k- 


V 


t-        A        »        *        K 


ii    . 


•  k  *  » 


0. 

0»   Introduction  and  surmiary.   For  experiments  concerning  tvio 
simple  statistical  hypotheses,  the  canonical  forms  of  experiments, 
and  their  partial  ordering,  are  discussed.   It  is  proved  that  every 
such  experiment  is  a  mixture  (in  a  probability  sense)  of  simple 
experiments  whose  sample  spaces  each  contain  only  two  points. 

This  result  is  used,  with  standard  frequency  interpretations  of 
probabilities,  in  an  analysis  of  the  foundations  of  statistical 
inference.   This  analysis  establishes  the  likelihood  function  as  the 
appropriate  basis  from  which  statistical  inferences  can  be  made 
directly,  for  the  familiar  general  purposes  of  informative  inference. 
For  the  numerical  values  of  the  likelihood  fimction,  this  analysis 
provides  direct  interpretations  in  terms  of  probabilities  of  errors. 
These  probabilities  admit  frequency  interpretations  of  the  usual 
kind,  but  they  are  not  in  general  defined  with  reference  to  the 
specific  experiment  from  which  an  outcome  is  obtained:  they  express 
intrinsic  objective  properties  of  the  likelihood  function  Itself, 
which  this  analysis  shows  to  be  appropriately  relevant  and  directly 
useful  for  purposes  of  informative  inference c   The  relations  of 
this  analysis  of  problems  of  informative  inference  to  problems  of 
testing  statistical  hypotheses,  decision-making,  conclusions,  and 
Bayesian  treatments  of  inference  problems  are  discussed  briefly. 

Generalizations  of  these  mathematical  results  and  their 
interpretations  for  problems  involving  more  than  two  simple  hypo- 
theses will  be  given  in  a  following  paper© 


A,   I-Iathematical  developments, 

1,   The  canonical  form  of  a  binary  exoerinent,   VJe  consider 
a  given  experiment  E,  assuming  that  questions  of  experimental 
design,  including  those  of  choice  of  a  sample  size  or  possibly  a 
sequential  sampling  rule,  have  been  dealt  vjith,  and  that  the 
sample  space  of  possible  outcomes  x  of  E  is  a  specified  set  S  =|xL 
We  assume  that  each  of  the  possible  distributions  of  X  is  repre- 
sented by  a  specified  elementary  probability  function  f.(x):  if 
the  hypothesis  II.  is  true,  the  probability  that  E  yields  an 
outcome  x  in  a  is 

C 
P^lA)  =  \f^(x)dM-(x)  , 

A 

vrhere  [a  is  a  specified  ,:r -finite  measure  on  S,  and  A  is  any 
measurable  set, 

VJe  assume  until  otherwise  stated  that  there  are  only  tvjo 
possible  distributions,  so  that  i  =  1  or  2o   Experiments  for  which 
this  is  the  case  will  be  termed  binary  experiments,   (It  is  very 
useful  to  discuss  more  general  experiments  also  vinder  this  simpli- 
fying assumption,  to  establish  theoretical  results  and  technical 
methods  which  can  then  be  appropriately  generalized,) 

Discussions  of  statistical  inference  problems  concerning 
binary  experiments  usually  specify  at  the  outset  that  the  problem 
Tinder  consideration  is  that  of  testing  the  simple  hypothesis  H^ 
against  the  simple  alternative  Hp,  or  that  of  making  one  of  tx-jo 
specified  decisions,  on  the  basis  of  an  observed  value  of  X,  Such 
discussions  seem  to  assume  tacitly  that  such  formulations  are  the 
only  ones  of  possible  interest,  or  at  least  the  only  ones  suffi- 


ciently  definite  to  allow  satisfactory  theoretical  treatment  and 
objective  practical  application.   (iJe  do  not  consider  here  form- 
ulations in  which  it  is  assumed  that  there  exist  probabilities  of 
the  hypotheses  themselves,  Prob  (H.),  i  =  1,2,  in  some  sense.) 
However  we  begin  with  a  less  formal  but  broader  specification:  the 
general  goal  is  to  make  Inferences  from  an  observed  value  of  X  to 
the  hypotheses.   Our  purpose  is,  first,  to  sho^^f  that  this  broader 
specification  suffices  to  guide  a  useful  analysis  of  the  intrinsic 
mathematical  structure  of  any  given  experiments  E,  which  results 
in  exhibiting  E  in  a  canonical  form.   The  latter  form  provides  a 
convenient  point  at  which  one  can  consider  the  testing  or  two- 
decision  problems  as  usually  formulated,  or  alternatively  some 
inference  problems  having  different  formulations  which  \-jill   be 
illustrated.   Moreover,  this  canonical  form  exhibits  some  math- 
ematical properties  of  the  experiment  E  which  are  of  intrinsic 
interest  and  relevance  for  statistical  inference  in  general;  these 
properties  are  not  fully  and  clearly  exhibited  by  consideration  of 
just  those  properties  of  E  which  are  relevant  for  specific 
inference  problems  such  as  the  testing  or  two-decision  problems. 
For  any  given  binary  experiment  E,  let 

r  =  r(x)  =  log  [f2(x)/f^(x)]. 
It  is  well  known  that  r  Is  a  (minimal)  sufficient  statistic.   Let 

P^(r)  =  Prob  [r(X)  <  r|H^],  i  =  1,2, 

In  general  r(X)  is  a  generalized  random  variable  in  the  sense  that 
it  may  assume  infinite  values  with  positive  probability  under  one 


or  both  hypotheses;  correspondingly.  In  general  F,  and  Pp  are 
generalized  cumulative  distribution  functions. 

If  E"  is  any  second  binary  experiment  which  may  be  conducted 
for  inferences  concerning  the  same  ti-;o  hypotheses,  with  possible 
outcomes  x"  in  S",  and  probability  densities  fV(x"),  i  =  1,2, 
let  r"  =  r'"(x""")  =  log  [f':^{x")/f':^ix')],   and  let  F"^'(r),  i  =  1,2,  be 
the  respective  c.dof's.  of  r"(X")  under  H^ ,  Hp*   Then  the  experi- 
ments E  and  E"  are  equivalent  if  and  only  if  P'r(r)  =  P.  (r)  for 
-  00  ^  r  5  00  and  i  =  1,2,   (This  equivalence  is  not  contingent 
upon  any  specific  formulation  of  an  inference  or  decision  problem, 
but  rests  on  the  fact  that  an  observation  on  R"  =  r'"(X'")  based 
upon  E"  is  mathematically  equivalent  to  an  observation  on  R  =  r(X) 
based  upon  E,  )   The  pair  of  distributions  F^^Fp  °-^  ^^^    sufficient 
statistic  r  may  be  taken  as  a  canonical  form  of  any  binary  experi- 
ment E.   Such  a  canonical  form,  while  useful,  lacks  parsimony  and 
flexibility  in  the  sense  that  there  are  inherent  mathematical 
restrictions  on  the  range  of  possible  pairs  P-i^Pp  °^   generalized 
c.d.f's.  which  can  represent  a  binary  experiment.   For  example, 
if  P^(-oo)  =  1,  then  necessarily  Fp(r)  =  0  for  every  r  <  oo.  A 
canonical  form  which  is  more  convenient  for  many  purposes  is 
obtained  as  follows: 

We  proceed  to  define  the  continuous  probability  integral 
transformation  of  r(X)  under  H, :   Let 


and 


u(r,z)  =  zP^(r)  +  (l-z)F^(r-) 


v(r,z)  =  zF2(r)  +  (l-2)P2(r-) 


-il' 


T-)v 


\  V. ;  1    ■-= 


i;'-- 


«  3  '  X  c  J'  ^ 


for  0  <  z  <  1  and  -  oo  ^  r  ^  oo.   For  each  fixed  r,  u(r,z)  and 
v(r,z)  are  nondecreasing  in  z,  and  one  or  both  are  increasing  in 
z  according  as  F,  or  Po  or  both  are  discontinuous  at  r.   For  any 
fixed  z  and  z',  r'  >  r  implies  u(r',z»)  >u(r,z)  and 
v(r',z')  >  v(r,z).   If  Z  is  a  random  variable  which  has,  under 
each  hypothesis  H,  ,  Hp,  the  same  luiif  orm  distribution  on  the  unit 
interval,  0  <  Z  ^  1,  independent  of  X,  then  under  H,  the  random 
variable  U  =  u(R,Z)  has  a  uniform  distribution  on  the  unit 
interval; 

Prob  [u(R,Z)  ^  u|H^]  =  u,   for  0  $  u  g  1, 

Thus  c.dof.  of  U  under  H,  is  represented  by  the  main  diagonal  of 
the  unit  square  (v  =  u).   Similarly,  V  =  v(R,Z)  Is  the  continuous 
probability  integral  transform  of  r(X)  under  Hp." 

Prob  [v(R,Z)  s   vlHp]  =  v,   for  0  5  v  g  1  . 

The  c.dofn  of  V  under  Hp  is  represented  in  the  (u,v)  plane  by  re- 
garding V  as  an  argument,  and  by  taking  again  the  main  diagonal  of 
the  unit  square  (u  =  v)  as  the  graph  of  the  c»dof» 

u(r,z)  is  a  sufficient  statistic  for  the  experiment  E«   It  may 
be  regarded  as  defined  on  the  sample  space  S'  =  [(x,z)]|x  e  S, 
0  <  z  <  1]  of  the  augmented  experiment  j3  '  which  consists  of  E 
supplemented  by  an  observation  on  Z,   Clearly  E'  is  equivalent  to  E, 
and  u(r,z)  is  sufficient  for  E'.  {a   best  critical  region  of  size 
a  for  testing  H,  against  Hp  is  [ (x,z) |u(r,z )  >  1-a] » ) 


•a 


'ioi      ,oj    ,:    •■ 


lo    I-;rio- 


If- 


To  determine   the   G.d*f,    of  U  under   H2*    let 

r[u]    =  min   [r|u(r,l)    =  P-|_(r)    >  u] , 
and  let 

z(u')   =  max   [z|0  <   z  ^  l,u(r[u'],2)   ^  u'],      for   0  £  u'   s  !• 
Then 

Prob    [u(R,Z)  s  ufH^]    =  Prob    [r(X)   <  r    [u] IH2] 


+  Z(u)[F2^(r[u])   -  P2(r[u]    -)] 


=  z(u)P2(r[u3)    +  (l-z(u))F2^(r[u]    -    ) 

=  v(r[u],z(u))  ,  which  we  denote  by  v(u), 
for  0  s  u  ^  I9   To  show  that  v(u)  is  convex,  suppose  that 
0  ^  xu  <  Up  <  u^  5  1,  with  z(u.)  =  1  for  j  =  1,2,3«   By  definition 
v(u)  is  linear  in  u  on  any  interval  for  which  r[u]  is  constant; 
hence  in  verifying  the  convexity  of  v(u),  we  need  not  consider  any 
u.  which  is  an  interior  point  of  such  an  interval.   For  j  =  1,2, 
let  A.  =  [x|r[u.]  ^  r(x)  <  r[u,^,]],  and  consider 

v(u^.^^)  -  v(u^)  =  Prob  [r[u^]  s  r(X)  <  r[u.^^]  IH2]  =  P2^^i^ 

and 

u,^^  '  ^A  =   Pr-ob  [r[u.]  $  r(X)  <  r[u.^^]l'H^]  =  P]_(A.)   . 

Now  the  generalized  derivative  of  P2  with  respect  to  P,  Is 

dP,  ,  ^ 

^  =  f2(x)/f^(x)  =  e''^''^   , 


6 


r[u2] 


which  is  less  than  e      throughout  iU  ,  but  is  at  least  as  large 

r[up] 
as  e      throughout  Ap,   It  follows  that 


that  is 

[vCug)  -  v(u^)]/[u2  -  u^]  <  [v(u-)  -  v(u2)]/[u,  -  U23  , 

which  proves  the  convexity  of  v(u)«   It  follox-js  that  v(u)  is 
continuous  except  possibly  at  u  =  1,  where  v(l)  =  1  always. 

The  function  v(u)  or  its  graph  may  be  regarded  as  the 
canonical  form  of  any  binary  experiment  E,   VJe  proceed  to  show 
that  every  such  convex  function  v(u)  on  the  closed  unit  interval 
represents  in  this  way  some  binary  experiment:   Let  H-  specify  that 
U  has  the  uniform  density  function,  with  respect  to  Lebesgue  meas- 
ure, f, (u)  =1,  0  ^  u  ^  1«   Let  Hp  specify  that  U  has  any  given 
convex  c.d.f,  v(u)  with  v(0)  =  0,  v(l)  =  1.   Let  fp(u)  be  the 

right  derivative  of  v(u)  for  each  u  <  1,  and  let  f pd )  =  oo.   Then 

u 
v(u)  =  \    fp(u)du  for  each  u  <  1;  if  v(u)  is  continuous  at  u  =  1, 

the  latter  formula  is  also  valid  there j  in  general,  we  have 

Prob  [U  =  1|H23  =  1  -  v(l-)  >  0,   The  sufficient  statistic  r(x) 

takes  the  folloi-;ing  form  in  this  case: 

r  =  r(u)  =  f2(u)/f^(u) 

=  fp(u),   for  0  <  u  <  1, 


i  r\ 


•  *4^.f 


61    J"  -Jii  «J 


» .1     r.    ■■-' 


Its  c.d.f,  under  H,  is  continuous,  and  the  continuous  probability 
integral  transform  U  of  r(u)  under  H-  has  as  usual  the  uniform 
c.d.f, : 


Prob  [r(U)  s  r{\x)  JH^]  =  u,     0  s  u  g  1. 

The  distribution  of  U  \mder  H2  is  thus  given  by  the  c.d.f,  v(u), 
completing  the  proof. 

It  is  often  convenient  to  consider  a  binary  experiment  S  as 
represented  by  its  "v(u)  curve,"  and  to  consider  the  latter.  If 
discontinuous,  as  supplemented  by  a  vertical  line-segment,  so  that 
in  all  cases  a  v(u)  curve  is  a  graphically-continuous  curve  from 
(0,0)  to  (1,1), 

2,      Simple  binary  experiments,  A  binary  experiment  with 
v(u)  =  u  is  trivial  in  the  sense  that  its  sufficient  statistic 
r  =  r(x)  has  the  same  distribution  under  each  hypothesis.   It  is 
equivalent  to  an  observation  just  on  an  auxiliary  randomization 
variable  Z  unrelated  to  the  hypotheses.   In  any  such  experiment  we 
have  under  each  hypothesis  that  r(X)  =  0  with  probability  one. 
Such  experiments  will  be  called  uninf ormativea  and  all  other 
experiments  ^^^ill  be  called  informative c 

A  binary  experiment  will  be  called  simple  if  its  sufficient 
statistic  r  assumes  at  most  two  distinct  values,  r,  ^  Vp,    (with 
exceptions  on  sets  of  points  x  having  probability  0  under  each 
hypothesis),  A  binary  experiment  which  is  not  simple  will  be 
called  composite.  In  an  informative  simple  binary  experiment,  we 


8 

have  r^  <  r^*  with  each  value  having  positive  probability  under 
at  least  one  hypotheses.   In  any  such  experiraent ,  let 

p.  =  Prob  [r(X)  =  r2|H^],   and  q^  =  1  -  p^,   for  i  =  1,2. 

Then  0  ^  p,  <  pp  ^  1,  or  0  g  qp  <  q^  g  1;  the  point  {q,,qp) 
characterizes  any  such  experiment,  since  its  v(u)  curve  consists 
of  tvjo  line  segments  connecting  successively  the  points 
(0,0),  (q^,q2),  (1,1). 

Conversely,  every  such  v(u)  curve,  or  every  point  (q, ,qp) 
with  0  <  q2  <  q-,  g  1,  characterizes  an  informative  simple  binary 
experiment.   For  consider  any  such  pair  and  the  expeririient  E 
consisting  of  a  single  Bernoulli  trial  such  that 

q^  =  Prob  [X  =  OiH.] 
and 

Pi  =  1  -  q^  =  Prob  [X  =  llli^],    i  =  1,2, 

Its  sufficient  statistic  is 

pr^  =  log  (qg/^i)  if  X  =  0  , 
r(x)  = 

'2  ^  i°s  (P2/P1)  if  y^  =  1 

Any  such  experiment  may  be  cliaracterized  by  a  point  (q^  ^q^)  i^  'the 
range  indicated  above;  or  alternatively  by  a  point  (r, ,rp)  satis- 
fying -oogr,  <  0  <  T2  ^   00,  that  is  by  a  point  in  the  second 


quadrant  of  the  (r, ,rp)-plane  excluding  the  coordinate  axes  but 
Including  all  points  with  one  or  both  coordinates  Infinite. 

A  third  representation  of  any  Informative  simple  binary 
experiment  is  given  by  the  ordered  pair  (L, ,Lp)  of  possible  values 
of  the  likelihood  ratio  statistic: 

^1  ~  ^2/^1  ~  ®   *  ^2  "^  P2/P1  ~  ®   ,  0  g  L^  <  1  <  L^  ^  CO  , 

so  that  q^  =  (L2  -  1)/L2  -  L^)  and  q2  =  L^  q^.  A  fourth  repre- 
sentation of  any  such  experiment  is  given  by  considering  the  only 
nontrivial  nonrandomized  best  test  of  H,  against  Hp,  which  re- 
jects H-,  just  when  r(x)  =  Vp',    the  probabilities  of  errors  of  Types 
I  and  II  respectively  are  (a,p)  =  (p-j_,q2)>  which  satisfy 
a  +  (3  <  1,  A  fifth  useful  representation  of  any  such  experiment 
is  by  means  of  a  stochastic  matrix: 


E  = 


An  uninf ormatlve  simple  binary  oxporlraent  is  represented  by 
(r^,r2)  =  (0,0),  or  by  (L^,L2)  =  (1,1),  or  by  (q^,q2)  =  (q^^^i) 
for  any  q,,  or  by  (a,p)  =  (a,l-a)  for  any  a. 

Example  s « 

1,   "One  toss  of  a  coin"  experiments.  As  Indicated  above, 
every  simple  binary  experiment  is  equivalent  to  an  experiment  con- 
sisting of  a  single  observation  on  a  Bernoulli  random  variable  X 
with  possible  values  0  or  1  only. 


mXcv 


10 

2,   A  Wald  seQuential  probability  ratio  test  between  tvjo 
simple  hypotheses,  in  special  cases  including  certain  tests  on  a 
binomial  parameter   (the  cases  in  which  there  is  "no  excess  at 
termination"),  is  based  on  a  sequential  sampling  rule  which  allows 
only  two  values  for  the  likelihood  ratio  statistic,  or  for  r(x). 
In  many  other  cases,  such  tests  might  be  called  approximately 
simple  in  the  s  ense  that  under  each  hypothesis  the  probability 
of  r(X)  =  r^  or  r^  is  very  near  unity. 

3o   Coraiiiunication  channels »   In  comraunicatlon  theory  (inform- 
ation theory),  a  communication  channel  (without  memory)  is  any 
structure  which  can  receive  at  one  point  any  one  of  a  specified 
set  of  "input  signals"  and  deliver  at  another  point  one  of  a 
designated  set  of  "output  signals",  the  respective  probabilities 
of  the  latter  depending  only  upon  the  selected  input  signal.   In 
the  case  of  just  two  input  signals,  which  we  may  denote  by 
H,  ,Hp,  we  have  a  binary  channel;  ^^re  may  denote  the  set  of  possible 
output  signals  by  S  =  ^  x j  ,  and  the  respective  probabilities  of 
subsets  A  of  S  by  P.  (it),  i  =  1,2,   Thus  each  such  communication 
channel  is  mathematically  equivalent  to  a  binary  experiment,  and 
conversely.   If  x  =  0  or  1  only,  we  have  a  simple  binary 
("two-by-two")  .channel  ,  equivalent  to  a  simple  binary  experiment. 
Here  (a,p)  describe  completely  the  structure  of  "noise"  in  the 
channel:  a  is  the  probability  that  transmission  of  IL  will  lead  to 
receiving  of  x  =  1,  and  p  is  the  probability  that  transmission  of 
Hp  will  lead  to  receiving  of  x  =  0, 


tr- 


Ci   -^U.    ^    •>■  J 


,  f*,£>     J  r* 


••O       ^'J     :^     (X)*-     "^6 


.•1 


,  .^-    ■l''-        T*'?'''^ 


Vr^*X- 


11 

Noisy  channels  In  scries »   It  is  convenient  to  introduce  and 
illustrate  some  techniques  required  below  as  an  elaboration  of 
the  present  example.   Let  o-hannel  E   have  inputs  H^,  E^,    outputs 
X  =  0  or  1,  and  noise  parameters  (a,p).   Let  channel  E'  have  in- 
puts X  =  0  or  1,  outputs  x'  =  0  or  1,  and  noise  paraaneters 
(a«,p').   Then  the  channel  E""  consisting  of  E  followed  by  E'  has 
inputs  H^ ,Hp,  and  outputs  x'  =  0  or  1,   It  is  useful  to  write 
E'''"  =  EE»,  since  if 


E=I  ■■■  ^   andE»='  ^     ^ 


I  „  I 


V42 


then 


,.y^iPi\(^iPi',  .,„ 

\q2  Pz/  ^2  P2^ 


'q-Lq-[  +  P^q^    q^Pi  +  P^P^ 
.^2^1  ""  P2^2    "^sPl  "^  P2P2. 


The  noise  parameters  of  E"  are 


(a''%p'''")  =  (p"^',q2)  =  ((l-a)a.  +  a(l-(3t),  p(l-a')  +  (l-p)at), 


The  other  representations  of  E'~  include 


^1  -  ^S^'^l  ~  ^^^2^1  "*■  Pl'-^2^/^^l^i  ■*■  Pl^2^ 


\'::i^-L, 


12 
and 

^2  =  ?£/?£  =  (q2Pi  ■*"  P2P2^/^'^iPi  "^  Pi^a^  • 

If  Qo  =  0  but  p'  >  0,  vie   may  say  that  E»  has  noise  affecting  only 
the  transmitted  signal  x  =  0;  in  this  case  vje  may  also  say  that 
E'  has  noise  i-jhich  degrades  only  the  received  signal  x»  =  1,  since 
the  received  signal  x'  =  0  is  known  with  certainty  to  follox-i  from 
a  transmitted  signal  x  =  0,  while  a  received  signal  x'  =1  is 
known  to  be  possible  following  either  transmitted  signal  x  =  0  or 
1»      In  such  a  case  we  have  L^  =  qp/^-i  =  L,  and 

L^  =  (P2  +  Pi^a^/^Pi  ■*■  Pi^i)  <  P2/P1  =  ^2 

(assuming  p^  <  p^,    the  remaining  case  being  trivial).   Similarly  If 
p,  =  0  but  q2  >  0,  E'  has  noise  affecting  only  x  =  1  and  degrading 
only  x'  =  0,  and  LZ   =  pp/p-i  ^  Lp, 

^1  "^  ^^2  "*■  ^2^2  ^/^^l  "^  ^2Pl^  ^  ^2/^1  -  ^1 

(assuming  the  nontrivial  case  p-,  <  P2)»   It  is  easily  verified 
that  every  channel  E'  is  equivalent  to  a  pair  of  cliannels  in 
series,  E  =  E,  Ep,   where  E-  has  noise  affecting  at  most  the  signal 
X  =  1,  and  Ep   has  noise  affecting  at  most  the  signal  x  =  0, 
It  follows  that  for  any  simple  binary  channels  E,  with 
parameters  (L,  ,Lp),  and  E',  the  channel  E"  =  EE'  has  parameters 
(L^yLp)  satisfying  L.  g  L'r  s  1  g  Lp  ^  ^p"     ^^^"^  conversely.  If  E 
and  E"  are  channels  with  parameters  satisfying  these  inequalities, 
then  there  exists  a  channel  E'  such  that  E"  =  EE',   Since 


1.  '1 


13 


J,  =  log  L. ,  these  Inequalities  may  be  vrrltten 

ij..   In  every  binary  experiment,  if  the  outcome  x  is  to  be 
reported  only  by  a  conclusion  of  the  form  "reject  H^"  or 
"accept  H, "  based  on  a  specified  significance  test  with  error- 
probabilities  (a,p),  then  the  over-all  procedure  is  formally  a 
simple  binary  experiment,  with  L,  =  p/d-ct),  L^  =  (l-p)/aa 

3,   The  partial  ordering  of  simple  binary  experiment s»  It  is 
natural  to  call  one  experiment  E  at  least  as  informative  as  another 
experiment  E'"  If  and  only  if  it  is  possible  to  use  E,  possibly 
supplemented  by  use  of  an  auxiliary  randomization  variable,  to 
construct  an  experiment  equivalent  to  E",  We  proceed  to  show  that 
for  simple  binary  experiments  this  implicit  definition  is 
equivalent  to  the  following: 

Explicit  definition;   E:(r,,r2)  is  at  least  as  informative  as 
E''':{v:J,r'p)    if  and  only  if  ^^  g  r£  ^  r^  ^  ^2' 

To  prove  the  equivalence  of  the  two  definitions,  we  show 
first  that  the  latter  condition  is  necessary  for  the  former:   If 
the  latter  condition  falls,  it  is  readily  verified  that  the  point 
(q",qp)  of  the  curve  v'~(u)  representing  E"  lies  belov;  the  curve 
v(u)  representing  E.   Hence  the  best  test  of  IL  of  size 
pr  =  1  -  q^  based  upon  E"  has  pov;er  pp  =  1  -  ^p'  exceeding  the 
power  1  -  v(q^")  of  the  best  test  of  size  p^  based  upon  E 
(allowing  randomization).   Thus  if  E'"  can  be  constructed  from  E 


Ik 

with  possible  randomization,  it  is  necessary  that  r,  ^  r^  ^  Tp  £  Tp, 

To  prove  the  sufficiency  of  the  latter  condition,  it  suffices 
to  apply  directly  the  concluding  discussion  of  Example  3  of  the 
preceding  section,  identifying  the  experiments  E  and  E*"  vjith  the 
channels  denoted  there  by  the  same  symbols,  and  interpreting  the 
(noisy)  channel  ^^   added  there  as  an  auxiliary  randomization 
device  used  with  the  experiment  E  to  construct  E"  =  EE'o 

To  denote  that  E  is  at  least  as  informative  as  E",  we  write 
E  >  E"  or  E"'  <  E,   It  is  also  convenient  to  denote  this  relation 
by  writing  that  E  contains  E",  since  this  terminology  has  been  used 
In  connection  with  communication  channels,  and  since  E  ^  E  if  and 
only  if  the  interval  (r-,,r2)  contains  the  interval  (rr,rp),  (or 
the  interval  (L-,L2)  contains  the  interval  (Lr,Lp)), 

If  E  ^  E"  and  E"  ^  E,  we  have  (r,,r2)  =  iv^fV'p) ,   and  vie   write 
E  =  E"  to  denote  that  E  is  equivalent  to  E",   We  write  E  j^   E"  to 
denote  that  E  and  E'"  are  not  equivalent.   If  E  >  E"  and  E  f^   E",  we 
write  E  >  E"  to  denote  that  E  is  more  informative  than  E",   If 
neither  E  >  E  nor  E"  ^  E  holds,  we  vjrite  E  ^  E"  to  denote  that  E 
and  E  are  not  comparable;  otherwise  we  write  E  ^  E  to  denote 
that  they  are  comparable,  (The  present  definition  of  "more 
informative  than"  differs  from  another  definition  which  has  been 
used  in  comparing  experiment Sp in  its  exclusion  of  the  case  of 
equivalence, ) 

The  partial  ordering  of  simple  binary  experiments  determined 
by  the  relation  >  is  conveniently  represented  graphically  in  the 
(r^,r2)  plane.   E  >  E"  denotes  that  {rV,r''p)    is  closer  than  (r^,r2) 
to  (0,0)  in  the  sense  that  at  least  one  of  its  coordinates  is 


I'SV 


^i-'  li 


V  , 


;',  <'  it". 


15 

closer  to  0  and  neither  is  farther,  E  ^  e"  denotes  that  one  of 
the  points  (r, jrp) , (r^^rp)  lies  to  the  upper-right  of  the  other. 

Any  finite  or  infinite  set  of  experiments  will  be  called 
strictly  ordered  if,  of  every  pair  in  the  set,  one  is  more 
informativoo  Each  such  set  of  experiments  corresponds  to  a  subset 
of  the  points  (r^jTp)  of  some  graphically-continuous  nonincreasing 
c\arve  from  (-00,00)  to  (0,0)»  Any  such  set  of  experiments  has  a 
paramatrlc  representation  (r, [d] ,r2[d] ),  with  r,[d]  nondecreasing 
and  r2[d]  nonincreasing  in  d,  where  d  has  a  specified  range* 

h-o      Mixtures  of  simple  binary  experiments.   If  various  experi- 
ments are  possible  for  a  given  inference  problem,  and  if  one  of 
these  is  selected  for  use  by  means  of  a  specified  random  device 
unrelated  to  the  hypotheses,  the  over-all  procedure  is  called  a 
mixture  of  experiments,  or  a  mixture  experiment.   Since  each  simple 
binary  experiment  is  represented  by  a  point  (r-,rp)  in  the  range 
described  above,  the  various  (generalized)  cumulative  distribution 
functions  G(r,,rp)  on  that  range  correspond  to  the  possible 
mixtures  of  simple  binary  experiments.   For  any  such  distribution 
G,  we  write  E^  to  designate  the  (mixture)  experiment  consisting 
of  the  selection  of  a  simple  experiment  (r-,,rp)  by  use  of  a  random 
device  corresponding  to  G,  and  the  observation  of  the  outcome  of 
one  trial  of  the  selected  experiment^  the  simple  experiments  will 
be  called  components  of  E,,, 

Any  such  mixtiire  experiment  S^  has  the  generic  sample  point 
X  =  (r^,r2^,ro)j  where  i^j.'^z'^    ^^   ^^^®  selected  simple  experiment  and 
r,  is  the  observed  outcome  of  that  experiment,  r^  =  r,  or  rp.  To 


JixCi. 


ih 


<.J-H, 


16 

determine  the  sufficient  statistic  r(x)  =  rCr^jr^jr^)  °^   ^^^^  ^ 
mixture  experiment,  let  f^(r^,r2,r^)  denote  the  probability  or 
probability  density  of  (r^,r2,r2)  if  H^  is  true,  i  =  1,2, 

The  conditional  distributions  of  R-,  given  (R^,R2,)  =  (r^,r2), 
are 

Prob  [R^  =  r^|(r^,r2),H^]  =  q^  , 

if  r,  <  Vpg 

Prob  [^2^=*v^i{r-^,r^),E^]   =  p^  =  1  -  q^,,     i  =  1,2, 

where  q.  =  q.(r, jrp)  are  determined  as  above  by  r,  =  log  (qo/qn)* 

r^  =  log  (?£/?]_  )•   If  ^1  ~  ^2  ~  °'  then  R^  =  0,  and  we  may  take 

p  =  p  =  1,  Hence  the  marginal  probability  or  probability  density 
of  (r^,r2)  is 

rf^(0,0,0,)  ,   if  r^^  =  r2  =  0  , 

(^f^(r^,r2,r3_)q2  +  f i(r3^,r2,r2)p^,  If  r^  <  r2  , 

for  i  =  1,2,  However  f,(r,,r2)  E-f2^^v^frp){a,e,,    H.  and  H2),  since 
the  distribution  G  of  (R^,Rp)  is  independent  of  the  hypotheses. 
Hence  we  can  write 

rf^(0,0)        if  r^  =  r2  =  r^  =  0, 

^i^^l'^2'^3^  ^  1  ^l^^l'^2^^i    ^^  ^2  =  r^  <  r2  , 

Vl^^l'^2^Pi    ^^  ^3  ~  ^2  ^  ^1  * 

for  1  =  1,2, 


!  >    .1  y.i 


n  " 


17 


Hence  the  sufficient  statistic  of  E^,   an  arbitrary  mixture  G(r, ,rp) 
of  simple  binary  experiments  (r, ,r2)*  is 

r(x)  =  r(r-|^,r2,r^)  =  log  [f2(r^,r2,r- )/f^(r^,r2,r^)]  =  r^  « 

Examples 

1,  Binomial  mean-^   Consider  the  five  simple  binary  experiments 
E  ,E^,8,6E,  defined  by  the  respective  pairs  of  parameters  (L^^Lp) 
given  in  Table  I  below.   The  table  gives  also  the  parameters 
(Pt*P2^  ^^'^   (a, (3)  of  these  experiments  to  four  decimal  accuracy. 
The  table  also  gives  a  nvimber  of  discrete  distributions 
G°  =  G°(r, ,rp):  for  each  c,  0  <  c  <  1,  a  mixture  experiment  E   Is 

defined  by  the  five  probabilities  g°  =  Prob  (E.  ),  i  =  0,  1,...  [j.. 
It  is  convenient  to  use  the  notation  g  E  +  g,E  +  , , ,  +  g  E,  to 
denote  the  operation  of  mixing  the  experiments  E  ,6e«E,  with 
respective  probabilities  g-,.»egi  »   We  can  then  write,  for  each 
c,  0  g  c  ^  1, 


^=7>   1  1 


E 
G°    i=0 


Consider  next   the  binomial  experiment   E^   consisting  of  four 
observations,  with  parameter   0  =   ,2   or   «8: 

f^(x)   =   (^)(.2)^(.8)^-^,  f2(x)   =  (^)(o8)^(.2)^-^,  X  =  0,1, ...ij.  . 

The  following  assertion  can  be  verified  by  simple  direct  calcu- 
lations:  The  mixture  experiments  E   defined  above  are  equivalent 


ZJ. 


TABLE     1 


18 


SOME   SIMPLE  BINARY  EXPERIIENTS 


Experiment         (Pt5P2) 


(L^,L2) 


(a,p) 


E, 


E, 


E, 


E, 


(.5,  .5) 
(.0588,  ,91112) 
(e0039.,  ,9961) 
(.0037,  .9377) 
(«0623,  09963) 


(1,  1) 
(1/16,  16) 
(1/256,  256) 
(1/16,  256) 
(1/256,  16) 


(c5,  .5) 

(.0588,  .0588) 
(o0039^  .0039) 
(.0037,  .0623) 
(e0623,  .0037) 


SOME  DISTRIBUTIONS  DEFINING  MIXTURES  OF  THE  ABOVE  EXPERIIffiNTS 


G°,  0  ^  c  S  1 


g^  =  (^)(.2)2(.8)2  =  .1536 


g^  =  (^)(o2)(.8)^  +  (^)(.2)3(.8) 
g^=  (^)(.8)^  +  ([[)(, 2)^  ^  „5920 


g3  =  0 


So  =  So  =  .1536 


gj  =  0 


gj  =  (l-c)g;L 


g2  =  0 


c  _ 


^k 


=   0 


g3  =  (l-go)/2 

=  .1^-232 


gj  =  g3  =  »i^232 


g2  =  (l-c)g2 


c  _   1 

g-2  =  ego 


-«£•■. «!rfV.l.J.«-.-    .■■■i^JU  •• 


■I,   .«.!..  ■„S-»^mm^ 


19 


FIGURE   I,      CANONICAL  FOIiMS    OF  A   BINOMIAL  EXPERIMENT 
E   AND   ITS   FOUR  SIMPLE   COMPONENTS 
(Graphs   ofv(u),    0<u<l) 


E: 


E, 


u 


E,: 


E^: 


E3: 


X- 


Partial   ordiering:   Eq  <  E^  <  E^   <  E^,   E,    <  E.     <  Ep,   E     <  E  <  E    , 
Eq  <  E  <  E^    . 


20 


to  one  another,  and  each  is  equivalent  to  E„o   That  is, 
Eo  =  E   for  each  c,  0  <  c  <  lo   The  v(u)  curve  of  E„  is  easily 
determined  from  the  given  binomial  distributions  f,(x),  and  consists 
of  the  line  segments  between  the  successive  points  (given  to  four- 
decimal  accuracy):  (0,0),  (  »590l+,oOOl6  ),  ( o8l92,o0272) ,  ( ,9728,  ,l808)  , 
( ,998[|.,.i|096),  and  (1,1).   It  may  be  noted  that  only  one  of  the 
above  distributions  G  represents  a  mixture  of  strictly  ordered 
simple  binary  experiments,  namely  G  =  G, 

Example  2,,   Normal  mean. 

The  symmetrical  siinple  binary  experiments  (r-,rp)  are  those 
for  which  r,  =  -  rp«  Any  mixture  G(r, ,rp)  over  this  strictly 
ordered  class  of  experiments  can  be  represented  conveniently  by  the 
marginal  c.dof»  of  Rp  under  G,  which  we  denote  by  G(rp).   Let 

G(r2)  =  1(^2  -  f)  -  1(-  r2  -  ^) ,  for  0  ^  r2  S  oo, 

^  .u2/2 


where     ]|(u)  =  \  4(u)du   and  ^{u)    =  (2Tr)   e 

1=^00 


^2 


Then  G(r2)  =  \  g(y)dy,  where 

0 


g(y)  =  4(y  -  |)  +  c|)(  -  y  -  |)  . 

Under  hypothesis  H^,  the  sufficient  statistic  r-  of  the  mixture 
experiment  E^   has  the  density  function 


■  :t 


..  ..)  ,   i.      •■. 


:. !- 1  rr-.o ' 


21 


Ceir2)^^i-r^,r^)      if  r^  <  0  , 

f^lr^)  =-/ g(r2)Pj^(-r2,r2)   if  r^  >  0  , 

^(1*2)  if  r^  =  0  , 

Vp  3^2  -1*2 

where  r^  =    ko  I  j»  '^i^~^2»^2^  ~  ^®   -l)/(©   -e    ), 

'^1^"^2'^2^  ~  ®   'il^'^Z'^Z^ '   ^^^  Pi(-r2*^2^  =  1  -  q^(-r2,r2),  for 
i  =  1,2.   Upon  simplification  we  find  that  fn  (r^)  =  (i»(r-  +  -|^ 

fp(r_)  =  (|)(r_  -  2)5  thus  the  sufficient  statistic  r-  has  under 
each  hypothesis  a  normal  distribution  with  unit  variance,  with 
respective  means  -•  -p   and  -^  • 

Consider  next  the  experiment  E-,  consisting  of  a  single  observa- 
tion on  a  normally  distributed  random  variable  X,  having  unit 
variance  and,  under  the  respective  hypotheses,  means  -  -p   and  -^^ 
It  is  well  knovrn  that  for  this  experiment  the  sufficient  statistic 
is  r(x)  =   X,  which  has  under  the  respective  hypotheses  the  same 
(normal)  distributions  found  in  the  above  mixture  experiment  E„ 
for  its  sufficient  statistic  r,.   It  follows  that  the  two  experi- 
ments  are  equivalent:  E,,  =  E,^  , 

5»   Decomposition  theorem  for  binary  experiments  1,   In  the  pre- 
ceding examples,  two  binary  experiments  typical  of  those  treated  in 
mathematical  statistics  were  shown  to  be  mathematically  equivalent 
to  certain  probability  mixtiores  of  specified  simple  binary  experi- 
ments»  The  following  theorem  shows  that  every  binary  experiment 
can  be  decomposed  in  this  sense  into  simple  components* 


.1  'It 


I*     ir 


<iP. 


■r         ,r    ,    ,» 


n.t    h 


2Z 


Theorem^   Each  binary  experiment  Is  equivalent  to  a  mixture  of 
strictly  ordered  simple  binary  experiments. 

Proof : 

1,   Let  v(u)  be  an  arbitrary  convex  c»d»f«  on  the  closed  unit 
interval,  v(0)  =  0,  v(l)  =  1,  representing  as  above  any  given  binary 
experiment  E.   E  has  the  sufficient  statistic  u  with  distributions 

u 
Prob  Ju  g  ^|H-l?  =  u  e  \  du,    Prob  ^U  ^  u1H2^  =  v(u),      0  g  ^  S  1   ; 

u 

and  for  u  <  1,  v(u)  =  \  fp(u)du,  where  fp{u)   s   vUu)  is  the  right- 

0 
derivative  of  v(u)« 

Let  h(u)  =  u  -  v(u)  and  h"  =  sup  •)  h(u)  lO  g  u  <  1  i  »  VJe  have 
h  >  0,  except  in  the  case  v(u)  =  u,  0  g  u  <  1,  which  is  the  un-» 
informative  experiment  (r,,r2)  =  (0,0)  for  which  the  conclusion  of 
the  theorem  holds  trivially.  Assuming  h"  >  0,  the  function  h(u)  is 
concave,  h(0)  =  h(l)  =  0,  h(u)  >  0  for  0  <  u  <  1;  h(u)  is  continuous, 
except  possibly  at  u  =  1  corresponding  to  a  possible  discontinuity 
of  v(u)  at  u  =  le   If  v(u)  is  discontinuous  at  u  =  1,  we  define 
h(l)  as  multiple -valued,  having  all  values  in  the  closed  interval 
[1  -  v(l-),  1] ',   then  in  all  cases  h(u)  is  a  graphically-continuous 
concave  curve  on  the  closed  unit  interval.   The  right-derivative  of 
h(u)  is  h'(u)  El-  v'(u),  for  u  <  1. 

For  each  h,  0  s  h  <  h",  the  equation  h(u)  =  h  has  two  distinct 
roots  which  ue   designate  iju(h)  <  Up(h),   The  equation  h(u)  =  h"  is 
satisfied  on  a  closed  interval  or  at  a  single  point  u,  which  we 
designate  by  u^(h'")  g  u  g  U2(h"),  u,(h'"")  s  U2(h"^"),   u-(h)  is 


■i  a 


^l-]    i 


:r  '^   r 


?»  i      "i  ^' 


23 


continuous,  convex,  and  strictly  increasing  In  h,  0  ^  h  s  h  o   UgChJ 
is  continuous,  concave,  and  nonincreasing j  it  is  strictly  de- 
creasing in  h;,  for  1  -  v(l-)  ^  h  ^  h'"'  (that  is,  for  0  ^  h  g  h", 
vmless  v(u)  is  discontinuous),  and  U2(h)  =  1  for  1  -  v(l-)  ^  h  ^  h". 
Let  u'(h)  denote  the  respective  right-derivatives  of  u.(h),  for 
0  <  h  <  h";  then 


u|(h)  =  [1  -  f2(u^(h))]"^   f or  0  s  h  <  h'"",   i  =  1,2  . 


Corresponding  to  each  h,  0  <  h  <  h",  we  define  the  simple 
binary  experiment 

E^:   (r^[h],  r2Lh] )  =  (log  f2(u^(h)),  log  f2(u2(h))). 

Corresponding  to  h  =  h",  we  take  (r,[h"],  r2[h"])  =  (0,0),   These 
experiments  are  clearly  strictly  ordered. 

Let  Tl  -  (U2(h)  -  u^(h))^   for  0  £  h  <  h'", 

G-(h)  =\ 

[l,  for  h  =  h''''o   ^ 

Let  g(h)  =  u^(h)  -  U2(h),  for  0  s  h  <  h".   Then  G(h)  =  ^  g(h)dh 
for  0  ^  h  <  h"*', 

2,  VJe  define  the  experiment  E„   as  the  mixture  G  =  G(h)  of  the 
strictly  ordered  simple  binary  experiments  E,  :  (r,  [h],  Tolhl), 
0  ^  h  g  h"#  We  proceed  to  prove  that  E  =  E^^,  by  proving  that 
v(u)  =  Vq^(u),  0  g  u  ^  1,  where  Vg(u)  is  the  canonical  form  of  E^, 


*r  1.-V- 


;:on 


■  -  *  p  '•'  / 1  • 


21| 

For  each  h  <  h''%  the  simple  experiment  Ej^:(r^[h]  ,r2[h]  )  is 
equivalent  to  an  experiment  consisting  of  one  observation  on  the 
random  variable  U,  having  the  following  distributions: 

Prob  |U^  =  u^(h)lHj  =  q^(h) 

Prob^Uj^=  U2(h)|'H^|  =  p^(h)  =  !-qj,(h)  , 

where    q.(h),    i   =  1,2,    are   determined  by 

r^[h]    =  log   [q2(h)/q3_(h)],      r^Lh]    =  log   [p2(h)/p^(h)]      . 

For  h  =  h'"",   the  experiment    (r,[h"],   r2[h"]  )    =   (0,0)    is  equivalent 
to  the   trivial  experiment   consisting  of   one    observation  on  the   random 
variable   1!,-^^  which  has,   under  H,    and  Hp,    the    same  uniform  distri- 
bution on  the    interval    [u,(h"),   UpCh")],      Hence   the  mixture  experi- 
ment  in  which  one   observation  h  is   taken  on  an  ai;aciliary  randomiza- 
tion variable   H  with  the   c«dofa    G(h)    defined  above,    independent   of  the 
hypotheses,   followed  by   one   observation  on  the    corresponding  random 
variable   U,    whose   distributions  under  H, ,   H^,  were   given  above.      Each 
possible   outcome   of   this  mixture   experiment  has   the    form   (h,u,  )   where 

h  is  the   observed  value   of  H  and  u,     is   the    observed  value   of  U,  o 

h  h 

For  different  values  of  h,  the  ranges  of  U.  are  disjolntj  hence 
the  observed  value  h  is  a  fiinction  of  the  observed  value  u,  ,  and  the 
latter  is  a  sufficient  statistic  for  E^o   The  distributions  of  the 
statistic  u,  are  those  of  the  random  variable  Ujt,  which  are  de- 
termined as  follows! 


&. 


2$ 

Let  W,(u)  =  Prob  Sv^   ^  ^'^il  *  ^  ^  ^  i  1   i  =  1*2,  We  have 

W^(l)  =  1;  and  since  Prob  T H  =  0^  =  G(0)  =  0,  W^(0)  =  0,  for  i  =  1,2, 

For  0  <  u  <  u,  (h" ),  we  have 


u 

W.(u)  =  \  w^(u)du  ,   i  =  1,2  , 


where 


Hence 


We 


w^(u)  =  g(h(u))q^(h(u))/u{(h(u)) 

=  [u^(h(u))  -  u^(h(u))]q^(h(u))/u^(h(u))  . 

w^{u)  =  [1  -  u^(h(u))/u^(h(u))] 

.  [f2(u2(h(u))  «  l3/[r2(u2(h(u))  -  f^iu^ChCu) ) )] 
have  u^(h(u))  =  u,  and  for  brevity  we  write  here  u^  for  UpChCu)), 


for  0  <  u  <  u^Ch""),   Thus 


w^(u)  =  (1  -  [1  -  f2(u)]/[l  -  f^^u^)]) 


•[f2(u2)  -  l]/[f2(u2)  -  f2(u)]  =  1< 


Since  q2(h(u))  =  f2(u)q^(h(u) ) ,  we  have,  for  0  <  u  <  u,(h'"'). 


W2(u)  =  f2(u)   • 


»  i< 


;I)5-fA(U 


i*  '_■ 


-■tciK 


(  'ri 


26 

In  the   same  way  the    same  formulae   for  w^(u)   can  be   verified  for  the 

range   U2(h'"')   <  u  <  1,      If   Prob  |^  H  =  h""'?  =  0,   vU|_(h'"")    =  U2(h'"'),   and 

Prob^Ujj  =  u^(h'''")i:H^Z  =  0  f or   i  =  1,2.      If   ProbJ*  H  =  h'''"j  >  0, 

u,  (h")  <  Up(h""),  and  by  definition  we  have,  for  u,(h'")  <  u  ^  U2(h''"), 

w,  (u)  =  w^Cu)  =  f2(u)  =  1.   Thus  V(.(u)  =  ^   f2(u)du  =  v(u)  for 

0  <  u  <  1,  and  v^d)  =  v(l)  =  1,  completing  the  proof  that  E^  =  E, 

The  binomial  example  of  the  preceding  section  Illustrates 
that  in  general  a  binary  experiment  may  be  equivalent  to  many  distinct 
mixtures  G  of  simple  experiments;  that  is,  we  may  have  E„  =  E  .. 
although  G(r,  ,r2)  ^  G"(r,,r2)»  However  if  E  has  a  canonical  form 
v(u)  which  is  linear  on  the  interval  0  ^  u  £  u,  (h")  (that  is,  if  the 
right-derivative  v'(u)  assumes  only  one  value  less  than  unity, 
0  £  u  <  1),  it  follows  that  any  G  such  that  E^  =  E  is  a  distribution 
concentrated  on  a  single  vertical  line  and  possibly  on  the  origin, 
and  hence  that  E  has  a  unique  decomposition  into  simple  components, 
A  similar  sufficient  condition  for  uniqueness  of  G  such  that  E^  =  E 
is  that  Up{h'^ )   =   1,  or  else  that  v(u)  be  linear  between 
(UpCh^"),  v(u2(h")))  and  (1,1).   It  can  be  shown  that  a  necessary,  as 
well  as  sufficient,  condition  for  such  uniqueness  is  that  one  of 
these  conditions  on  v(u)  hold,  or  that  E  be  uninformative. 

6«  The  partial  ordering  of  binary  experiments.   Generalizing 
Section  3  to  binary  experiments  v;hich  need  not  be  simple,  we  have  the 
same  implicit  definition;   E  is  at  least  as  informative  as  E'"  if  and 
only  if  it  is  possible  to  use  E,  possibly  supplemented  by  use  of  an 
auxiliary  randomization  variable,  to  construct  an  experiment  equivalent 
to  E'',  This  is  equivalent  to  the  (generalized)  explicit  definition: 


,  v'TiiVifl'-^; 


.0     . 


n©'o 


V   J.. 


-  -  ■rvr"*V'^"  i"w>>  — *;vv-  • 


1   nl 


.  ■ -•  0 


fno 


:'-tQ'r 


.07 


E:  v(u)  is  at  least  as  Informative  as  E"j  v"(u)  if  and  only  if 
v(u)  g  v'^Cu)  for  0  ^  u  <  le   To  prove  the  necessity  of  the  latter 
condition  for  the  former,  we  note  that  if  for  some  u'  <  1  we  have 
v(u')  >  v"(u')/  then  the  best  test  of  H,  with  size  (1-u')  based  upon 
E  has  smaller  power  than  that  based  upon  E",  so  that  the  latter  cannot 
be  constructed  from  the  former  with  possible  randomization* 

The  proof  of  sufficiency  can  be  based  upon  the  representation 
of  E  as  the  equivalent  mixture  of  ordered  simple  experiments 
E,  =  (r-jlh],  TpLh]),  0  S  h  ^  h",  constructed  in  the  proof  of  the 
theorem  of  Section  5«  For  each  h,  let  E,    denote  a  simple  binary 
experiment  not  more  informative  than  E,  (that  is,  E,  ^  E,   ),  for 
each, z,  0  ^  z  ^  1»   Let  Z  be  an  auxiliary  randomization  variable 
xmiformly  distributed  on  the  unit  interval.   Then  E  is  clearly  not 
more  Informative,  in  the  sense  of  the  implicit  definition,  than  the 
mixture  experiment  E"  which  consists  of  taking  one  observation 
(h,z)  on  the  pair  of  independent  variables  (H,Z),  and  selecting  for 
use  the  corresponding  simple  experiment  E,   ,   If  the  families  of 
experiments  E,    are  chosen  in  all  possible  ways,  then  it  can  be 
verified  that  the  resulting  mixtiore  experiments  E  have  respective 
canonical  forms  v"(u)  including  all  possible  cases  satisfying 
v''(u)  ^  v(u),  0  s  u  <  1«  This  completes  the  proof  of  equivalence  of 
the  definitionsa 


;'0   xi 


,.,  IW  ."4.    -. -;    ,.     ,^,,^     ,,    . 

•J J.  VI  -^.i  Ci'iJii  i-j  .< 


■jri'-'     » JL.. 


28 

6*  Inference  methods  v/itli  probabilistic  justificationso 

7 •   On  the  inathematical  treatment  of  statistical  inference 
^problems.   In  the  modern  theory  of  probability  and  its 
standard  applications  in  the  empirical  sciences,  the  concept  of 
an  experiment  occupies  a  central  and  basic  position.   The  term 
probability  experiment  is  useful  to  denote  any  completely  specified 
mathematical  probability  model,  consisting  of  a  specified  sample 
space  S  =  (x)  of  possible  outcomes  x,  a  suitable  family  of  subsets 
A  of  S,  and  a  probability  function  P(A),  defined  on  those  sets, 
which  satisfies  certain  axioms.   The  term  statistical  experiment 
is  useful  to  denote  a  specified  set  of  two  or  more  probability 
experiments,  having  the  same  sample  space  and  family  of  subsets 
but  in  general  different  probability  f-unctions;  each  such  proba- 
bility experiment  may  be  labeled  by  a  parameter  point  Q,  and  the 
parameter  space  SX.  =   \0J   is  the  set  of  such  labels;  in  this  context, 
each  such  probability  experiment  represents  a  (simple)  statistical 
hypothesis  H^.   The  problems  of  statistical  inference  treated  by 
mathematical  statistics  are  formulated  on  the  basis  of  specified 
statistical  experiments.   (This  includes  problems  of  experimental 
design,  which  concern  the  appraisal  and  comparison  of  alternative 
statistical  experimentso  ) 

The  character  of  objectivity  which  Is  basic  to  modern  proba- 
bility theory  has  two  sources:  Its  mathematical  structure,  based 
on  unequivocal  and  consistent  formal  definitions  of  its  terms,  may 
be  called  mathematically  objective «  in  contrast  with  some  earlier 
versions  of  probability  theory.  When  a  probability  experiment  is 
used  in  relation  to  a  physical  phenomenon,  with  its  methematical 


I.        Vi-  - 


■-   'i   ■ 


29 

elements  linked  (directly  or  indirectly)  to  physical  entitles  or 
events  which  are  observable  or  verifiable  (in  fact  or  in  principle), 
through  xmequi vocal  and  consistent  coordinating  definitions,  the 
resulting  interpreted  mathematical  model  may  be  called  physically^ 
ob.1ective« 

The  character  of  objectivity  v/hich  is  a  basic  feature  and 
goal  of  modern  mathematical  statistics  is  based  first  of  all  on 
these  features  of  modern  probability  theory,  and  their  interpre- 
tations in  the  contexts  of  statistical  experiments  and  the  sit- 
uations where  these  are  applied*   Discussions  of  statistical 
inference  problems  which  do  not  have  specified  statistical  experi- 
ments as  their  frames  of  reference  are  usually  considered  \m- 
satisfactory,  and  lacking  in  objectivity. 

Of  the  situations  in  which  problems  of  statistical  inference 
or  decision-making  may  be  considered,  the  simplest  in  one  respect 
are  those  in  which  it  is  postulated  that  one  of  only  t^^fo  completely 
specified  statistical  hypotheses  (two  simple  hypotheses),  IL.  or  Hp, 
is  true;  that  is,  binary  statistical  experiments.  VJhile  it  is 
only  in  rare  cases  that  an  inference  situation  may  be  described 
with  useful  adequacy  by  such  a  simple  model,  a  thorough  mathematical 
study  of  such  inference  problems  is  useful  because  this  extreme  of 
mathematical  simplicity  allows  convenient  simple  development  of 
mathematical  results,  and  their  interpretations,  which  can  be 
generalized  appropriately  for  more  complex  problems* 

A  statistical  experiment  E,  for  example  a  binary  one,  is  a 
mathematical  model  which  is  assumed  to  represent  adequately  and 
completely  one  aspect  of  a  situation  in  v;hich  statistical  inference 


iy- 


30 


problems  may  be  considered;  it  is  only  this  aspect  of  an  inference 
situation  which  has  been  the  subject  of  discussion  in  the  pre- 
ceding sections. 

The  reinaining  aspects  of  an  inference  situation  include: 

(a)  the  conclusions  or  decisions  among  v.'hich  a  choice  must  be 

made  on  the  basis  of  an  observed  outcome  x  of  the  experiment  E; 

(b)  the  consequences  of  each  possible  choice,  considered  in  turn 
on  the  respective  assumptions  that  each  of  the  simple 
hypotheses  is  true;  and 

(c)  the  evaluations  of  such  alternative  consequences  by  the 
individual  in  the  inference  situation;  his  purposes;  and 
possibly  his  prior  opinions  or  information  concerning  the 
hypotheses. 

The  specification  of  these  additional  aspects  of  an  inference 
sitiiatlon  in  appropriate  and  formal  terms  is  often  difficult  or  ^.^ — - 
problematical,  even  when  all  of  the  general  features  of  the  in- 
ference situation  are  quite  clear.   Even  when  this  is  the  case,  if 
at  least  aspect  (a)  can  be  specified  definitely,  as  for  example 
that  just  two  conclusions  or  decisions  are  allov;ed,  then  it  is 
possible  to  give  an  analysis  of  the  inference  problem  having 
general  usefulness  in  connection  with  various  formal  or  informal 
specifications  of  the  remaining  aspects   (b)  and  (c).   Such 
analysis  allol^rs  a  general  appraisal  and  comparison  of  the  possible 
inference  rules,  and  thereby  an  appraisal  of  the  value  of  the 
experiment  E  itself  (and  possibly  comparison  with  alternative 


:?~'iO  o." 


S&C.       : 


M.;ia 


31 

possible  experiments)  for  the  inference  purposes  in  view.   In 
cases  where  adequate  formal  specifications  of  aspects  (b)  and  (c) 
are  possible,  this  analysis  retains  intrinsic  interest  as  part  of 
the  analysis  of  a  more  complete  formal  model  of  the  inference 

situation. 

8,   Two-decision  problems;  tests  of  statistical  hypotheses. 
We  consider  here  any  inference  situation  for  which  some  specified 
binary  experiment  E  is  assumed  to  be  an  appropriate  model.  With 
respect  to  aspect  (a)  of  such  a  situation,  referred  to  above,  the 
simplest  possible  specification  is  one  which  allows  only  two 
possibilities,  v/hich  rnay  be  denoted  conveniently  by  d^,  that 
decision  or  conclusion  which  would  be  considered  more  appropriate 
if  H,  were  known  to  be  true,  and  the  alternative  d^,   vjhlch  may  be 
referred  to  as  the  decision  or  conclusion  "reject  H^«"   In  this 
case,  any  specified  rule  of  inference  or  decision-making  can  be 
represented  by  an  explicit  inference  function  (or  decision  fimction) 
d(x),  defined  (and  measurable)  on  the  sample  space  S  =  -[xl  of  E, 
and  taking  values  d,  or  dp  only:  if  the  rule  leads  to  decision  d. 
when  X  is  observed,  we  have  d(x)  =  d.,  for  each  x,  and  for  i  =  1  •■ 
or  2,   Since  any  binary  experiment  E  can  be  represented  without 
loss  of  generality  in  its  canonical  form  v(u)  as  defined  above,  it 
is  convenient  here  to  take  the  sample  space  of  any  binary  experiment 
to  be  the  closed  unit  interval:   S=\ul0^u^l|;  then  without 
loss  of  generality  any  inference  rule  has  the  form  d(u),  and  may  be 
any  Lebesgue-meas\u:*able  fiinction  on  0  <  u  <  1  with  values  d,  or  dp 
only.  (This  form  has  the  secondary  advanta£,e  of  eliminating  the 
need  for  separate  consideration  of  randomized  decision  rules.) 


o 


Qd    V 


32 

For  each  such  rtile  d(u),  let  a  =  a  =  Prob  [d(U)  =  dpfH^]  and 
(3  =  p ,  =  Prob  [d(U)  =  d,  |Hp],   These  t^^^o  parameters  describe 
completely  the  probability  properties  of  the  inference  rule  d(u) 
(and  the  distributions  of  the  random  variable  d(U))  londer  H-  and  Hp! 
a  and  p  are,  respectively,  the  probabilities  of  errors  of  Types  I 
and  II  when  a  specified  inference  function  is  used. 

Each  inference  fvinction  d(u)  is  to  be  appraised  (directly  or 
indirectly)  in  terms  of  its  error-probabilities  a  ,  p  ;  since  the 
general  criterion  of  appraisal  is  that  both  error-probabilities 
should  be  as  small  as  possible,  and  since  in  general  relatively 
small  values  of  a  are  possible  only  along  with  relatively  large 
values  of  p,  and  conversely,  in  general  such  appraisal  leads  to 
a  partial  ordering  of  the  fTinctions  d(u).   In  the  exceptional  and 
(statistically)  trivial  case  of  the  completely  informative  experi- 
ment (v(u)  =  0  for  0  ^  u  <  1),  by  taking  d(u)  =  d-  if  u  <  1,  and 
d(l)  =  d^,   we  obtain  a^  =  p  =  0;  here  there  is  no  need  to  consider 
alternative  inference  functions,  and  the  one  described  can  be 
called  simply  "best"  for  all  inference  purposes.   If  E  is  not 
completely  informative,  then  for  each  a,  0  ^  a  ^  1,  let  d  (u)  =  dp 
if  u  >  1  -  a^  let  d^(u)  =  d^  if  u  <  1  -  a;  then  the  error- 
probabil  ities  of  d  (u)  are  a,  and 

f  v( [1-a] )  ,   for   0  <  a  £  1  , 
P  =  P(a)  =  A 

\v(l-)  ,      for  a  =  0  e 


f.r,.'. 


'^£i:i 


:c)b 


33 

Since  the  likelihood  ratio  statistic  of  E  is  v»(u),  a  non-decreasing 
function  of  u,  we  have,  by  the  fundamental  lermna  of  Neyman  and 
Pearson  that  for  each  a,  p(a)  is  the  smallest  Type  II  error- 
probability  attainable  by  use  of  any  inference  function  having  a 
Type  I  error-probability  not  exceeding  a^  that  is,  d^(u)  is  a 
best  test  of  H,  against  E^   of  significance  level  a. 

Let  a«  =  min  [alp(a)  =  0]  5  1  -  max  [ulv(u)  =0],   If  aT  <  l, 
then  for  any  a  >  ai,  the  inference  f\mction  dci(''^)  is  inadmissible 
for  reasonable  use,  since  d^it^^^  ^^  strictly  better,  having  a 
smaller  Type  I  error-probability  while  both  inference  f\anctions 
have  Type  II  error-probabilities  equal  to  Oo   If  0  g  a^  <  a^  ^  a« , 
then  (3(a-.)  >  p(ap),  and  the  inference  functions  d^  (u)  and  d^^  (u) 
are  not  comparable.   The  inference  functions  d^^Cu),  0  ^  a  ^  a', 
constitute  a  minimal  essentially  complete  class  of  (admissible) 
inference  fionctions.   For  the  problem  considered,  on  the  basis  of 
the  given  experiment  E,  no  other  inference  functions  need  be  given 
consideration^  but  no  further  analysis  or  simplification  of  the 
problem  of  choosing  one  of  these  inference  functions  can  be  given 
except  in  relation  to  formal  or  informal  specifications  of  the 
aspects  (b)  and  (0)  of  the  inference  situation  referred  to  in  the 
preceding  Section, 

9 •   Multi-decision  problems;  tests  based  on  critical  levels , 
To  Illustrate  most  simply  that  even  with  a  binary  experiment  it  is 
sometimes  appropriate  to  allow  more  than  two  possible  decisions  (or 
conclusions),  consider  the  case  in  x-jhich  three  decisions  are 
allowed.  Assume  that  decision  d,  would  be  considered  the  most 
appropriate  of  the  three  possibilities,  and  that  dp  would  be 


31}. 

considered  the  least  appropriate,  if  H,  were  known  to  be  truej  and 
that  d,  would  be  considered  least  appropriate,  and  dp  most 
appropriate,  if  Hp  were  known  to  be  true;  the  remaining  decision, 
d^,  is  then  more  appropriate  than  dp  if  H,  is  true,  and  more 
appropriate  than  d,  if  Hp  is  true©   An  example  would  be  a  sit- 
uation of  industrial  acceptance  sampling  in  which  it  is  assumed 
that  each  lot  of  items,  to  be  classified  after  inspection  of  a 
small  sample  of  items,  contains  either  a  certain  small  proportion 
of  defective  items  (H^  )  or  a  certain  higher  proportion  of  defective 
items  (Hp)|  and  the  possible  classifications  are:   d. ,  "apparently 
high  quality" J  or  dp,  "apparently  low  quality";  or  d_, 
"indeterminate  quality",   (It  is  not  difficult  to  specify  formally 
possible  costs  of  sampling  and  conditions  of  use  of  variously- 
classified  lots  under  which  such  a  three-classification  procedure 
would  be  appropriate.)   Another  type  of  example  is  represented  by 
designating  d,  as  the  conclusion  "reject  Hp  (in  favor  of  H-,),"  and 
dp  as  the  conclusion  "reject  H,  (In  favor  of  Hp),"  and  d-  as  the 
conclusion  "reject  neither  hypothesis"  or  "no  conclusion." 

As  in  the  preceding  Section,  any  definite  Inference  procedure 
here  can  be  represented  without  loss  of  generality  by  some 
function  d(u),  defined  on  the  unit  interval,  taking  values  d, , 
d^   or  d_.   The  relevant  properties  of  any  such  function  are  just 
the  four  error-probabilities  a.,  (3,,  i  =  1,  2,  where 

o^  =  Prob  [d(U)  =  ^21%-'  ~  *^®  probability  of  a  "major  Type  I  error," 
a^  =  Prob  [d(U)  =  d. |H^]  =  the  probability  of  a  "minor  Type  I  error," 
Pj_  =  Prob  [d(U)  =  d^lH2]  =  the  probability  of  a  "major  Type  II 

error,"  and 
P2  =  Prob  [d(U)  =  d2|H2]  =  the  probability  of  a  "minor  Type  II 

error," 


35 

Clearly  the  gener8.1  goal,  in  appraising  and  selecting  an  inTerence 
fxonction  of  this  form,  based  on  a  given  binary  experiment,  is  that 
each  of  these  error-probabilities  should  be  suitably  small.   If 
the  function  (3(a)  is  defined  as  above,  then  for  any  values  of 
a^  and  a^   such  that  0  g  a.  +  ap  ^  a'  (no  other  cases  should  be 
considered),  we  have  (by  the  Neyman- Pear son  lemma)  that  the 
smallest  possible  value  of  p,  is  P(cc^  +  ttp),  and  the  smallest 
possible  value  of  pp  is  P(a^)  -  P(a.-,  +  CLp);  and  that  these  are 
the  error-probabilities  of  the  admissible  three-decision  function: 

d- ,    if  u  <  1  -  a,  -  a2   , 
d(u)='^d^,   ifl-a^-apgu<l-a-.   , 
dp,   if  1  -  a^  S  u  • 

Comments  like  those  of  the  preceding  Section  apply  to  the  problem 
of  choice  of  a  particular  inference  fionction  of  this  form.  Any 
inference  or  decision  function  of  this  form  has  the  orobabilistic 
.justif ication  that  its  four  error-probabilities  are  "jointly 
minimum"  in  the  sense  that  no  one  of  them  could  be  reduced  except 
by  allowing  an  increase  in  one  or  more  of  the  others  a   The  policy 
of  using  such  an  inference  or  decision  function,  having  suitably 
small  error-probabilities,  is  thereby  justified  in  the  sense  that 
in  many  independent  applications,  under  respective  hypotheses,  the 
relative  frequencies  of  the  more  and  less  serious  errors  of  various 
kinds  will  tend  to  be  correspondingly  small. 

The  preceding  discussion  can  clearly  be  immediately  general- 
ized to  allow  any  finite  number  of  possible  decisions  or  conclusions, 
simply  ordered  according  to  their  decreasing  appropriateness 


36 

if  H,  is  true  (and  increasing  appropriateness  if  Hp  is  true)o 
Similarly  an  infinite  number  (not  necessarily  countable)  can  be 
allowed.   In  all  such  cases,  the  admissible  inference  or  decision 
functions,  having  probabilistic  justifications  of  the  kind 
illustrated  above,  \-illl   have  a  form  in  which  larger  values  of  the 
outcome  u  tend  to  indicate  conclusions  or  decisions  which  are 
more  appropriate  when  Hp  is  true. 

An  inference  technique  which  antedates  modern  mathematical 
statistics,  and  which  remains  in  ^^ride  use,  is  that  based  on  the 
critical  level  associated  with  an  observed  outcome:   V/hen  an 
appropriate  statistic  has  been  selected,  for  example  the  statistic  u, 
the  critical  level  is  defined  as  the  probability,  \mder  a  hypothesis 
E,  being  tested,  of  a  value  of  U  at  least  as  large  as  the  value 
observed: 

a(u)  =  Prob  [U  >  u|H^]   . 

Observed  values  of  a(u)  more  or  less  close  to  0  are  customarily 
interpreted  as  representing  more  or  less  strong  evidence  for 
rejection  of  H,  j  one  convention  of  interpretation,  which  is  clearly 
rather  schematic,  applies  the  term  "significant"  to  outcomes 
a(u)  s  ,05,  and  the  term  "highly  significant"  to  outcomes 
a.(u)  ^  ,01,   Leaving  aside  interpretations  v/hich  ascribe  to  a 
ni;imerical  value  of  a(u)  some  intrinsic  meaning  as  a  quantitative 
measiire  of  strength  of  evidence  against  H,  in  an  outcome  u,  there 
remains  the  qualitative  simple  ordering  of  conclusions  with  those 
favoring  Hp  more  strongly  corresponding  to  smaller  values  of  a(u)» 


^- 


37 


This  latter  qualitative  part  of  the  customary  interpretation 
of  various  possible  values  of  the  critical  level,  considered  in 
the  context  of  a  specified  experiment,  has  the  kind  of  probabilistic 
and  frequency  justification  described  above »   In  addition,  the 
numerical  values  of  a(u)  have  probabilistic  interpretations  related 
to  various  errors  of  Type  I;  for  example,  any  interpretation  of 
outcomes  a(u)  ^  oOl  as  "strong  evidence  against  IL "  will  be  highly 
inappropriate  if  H,  is  true^  but  will  be  made  with  probability 
only  .Ol  when  H,  is  true,   Hoi>7ever  techniques  based  upon  critical 
values  do  not  incorporate  systematic  consideration  of  error- 
probabilities  imder  H-e 

While  the  theory  of  Nejrman  and  Pearson  introduced  the  essential 
complementary  concept  of  errors  of  Type  II,  the  formal  development 
and  the  applications  of  this  theory  have  typically  been  based  on 
fixed-level  formulations,  and  hence  have  typically  treated  only 
two-decision  problems.   The  preceding  discussion  shows  that  a 
simple  adaptation  of  the  standard  fixed-level  theory  and  methods 
gives  multi-decision  and  corresponding  inference  methods  :\fhich 
have  the  flexibility  and  intuitive  appeal  of  the  traditional 
critical  level  technique,  and  also  an  appropriately  complete 
objective  probabilistic  appraisal  and  justification  based  on  con- 
sideration of  probabilities  of  errors  of  all  kinds  and  degrees,  in 
the  context  of  a  specified  statistical  experiment. 


38 

C*    Inference  methods  with  Intrinsic  justif ications» 

10,   Evidential  interpretations  of  outcomes.   A  traditional 
and  basic  type  of  application  of  inference  techniques  like  those 
discussed  above  occurs  in  empirical  research  situations,  where  such 
techniques  are  used  to  help  in  drawing  informations  inferences  or 
conclusions  of  a  general  character,  concerning  statistical  hypotheses 
of  interest,  from  observatlonso   For  brevity,  we  use  the  term 
Informative  Inference  to  refer  to  such  problems  and  processes. 
In  situations  of  Informative  inference,  when  a  test  of  a 
statistical  hypothesis  (appropriately  valid  and  efficient)  indi- 
cates rejection  of  that  hypothesis  on  the  basis  of  an  observed 
experimental  outcome,  this  is  Interpreted  customarily  as  evidence 
against  that  hypothesis  (and  as  evidence  favoring  an  alternative 
hypothesis).   In  fact,  the  essential  respect  In  which  testing 
methods  (and  related  Inference  techniques)  appear  useful  for 
purposes  of  informative  inference  seems  to  be  the  possibility  of 
such  interpretations,  which  for  brevity  we  term  evidential 
Interpretations  (of  test  outcomes,  or'  more  generally  of  experi- 
mental outcomes). 

The  preceding  Section  Illustrated  that  Interpretations  of 
this  kind,  which  may  be  expressed  hy  phrases  such  as  "moderately 
strong  evidence  against  H^  (for  Hp),"  or  "inconclusive  evidence," 
or  "very  strong  evidence  for  Hp,"  can  be  related  systematically  to 
Inference  techniques  having  comprehensive  objective  probabilistic 
justifications,  and  in  fact  can  be  formally  Incorporated  within 
such  techniques  as  far  as  may  be  desired.   Such  evidential  inter- 
pretations can  in  this  way  be  given  probabilistic  meaning  and 
justification,  in  the  sense  that  the  policy  of  making  such 


39 

interpretations  leads  in  repeated  applications  to  suitably  small 
frequencies,  under  respective  hypotheses  of  interpretations 
which  are  inappropriate  or  misleading  in  various  degrees-   In  each 
case,  such  probabilistic  meanings  and  justifications  are  conferred 
on  the  notions  and  terms  of  evidential  interpretations  only  within 
a  specific  frame  of  reference,  namely,  a  specific  mathematical 
model  of  some  statistical  experiment. 

Any  term  of  such  evidential  interpretations,  such  as 
"moderately  strong  evidence  against  H-,"  may  be  given  in  this  way 
a  precise  probabilistic  meaning  in  the  context  of  each  of  several 
different  binary  experiments,  including  some  non-comparable  experi- 
ments* VJhlle  the  several  formal  definitions  given  in  this  way  to 
one  such  term  may  be  considered  roughly  similar  and  comparable,  to 
be  precise  these  definitions  should  evidently  be  considered  formally 
and  substantially  different,  since  they  depend  respectively  upon 
different  and  non-comparable  contexts;  that  is^  the  scope  or  range 
of  such  meanings  of  evidential  terms  is  in  general  limited  to  a 
specific  mathematical  model  of  a  statistical  experiment  upon  which 
the  definition  of  such  a  term  is  based. 

On  the  other  hand,  if  such  definitions  of  one  such  term  are 
made  in  different  inference  situations  which  happen  to  be  repres- 
ented by  the  same  mathematical  model  (or  by  equivalent  models)  of 
a  binary  experiment,  and  if  such  definitions  coincide  formally  in 
their  relations  to  this  common  model,  then  evidently  we  may 
properly  say  that  the  term  has  been  given  an  objective  probabilistic 
meaning  and  interpretation  whose  scope  and  range  embrace  those  two 
inference  situations.   In  this  sense,  we  may  say  that  the  term  has 


1^0 


an  Intrinsic  (objective  probabilistic)  meaning  and  interpretation 
which  is  the  same  in  the  two  different  inference  situations. 

(It  is  to  be  recalled  that  we  are  restricting  consideration 
to  evidential  interpretations  of  the  outcomes  of  the  statistical 
experiments  under  consideration,  and  leaving  aside  the  aspects  (b) 
and  (c)  of  inference  situations  referred  to  in  Section  7  above. 
The  latter  aspects  may  be  quite  different,  and  need  not  be  compar- 
able, in  two  inference  situations  represented  by  the  same  mathe- 
matical model  of  a  statistical  experiment.   In  particular,  indirect 
evidence  or  a  priori  opinion  concerning  hypotheses  may  be  quite 
different  (quantitatively  or  qualitatively)  in  two  such  situations 
represented  by  the  same  model  E;  our  discussion  may  be  said  to  be 
restricted  to  just  the  outcome  of  E  and  the  interpretation  only  of 
its  contribution  to  the  total  body  of  evidence  or  opinion  of  various 
sorts  which  may  be  available  in  one  or  another  inference  situation, ) 

11.   Symmetric  simple  binary  experiments.   It  is  convenient  to 
refer  to  the  outcome  Vp   of  any  simple  binary  experiment  (r, ,rp)  as 
"positive,"  and  to  the  outcome  r,  as  "negative."  A  simple  binary 
experiment  v;ill  be  called  symmetric  if  r,  =  -  r^,  that  is,  if  the 
experiment  is  of  the  form  (-rp^rp);  in  the  present  section  we 
consider  only  experiments  of  this  form.   Each  such  experiment  is 
characterized  by  a  number  r^,  ^  ^  1*2  =  '^°»   This  class  of  experi- 
ments is  simply  ordered,  by  the  parameter  rp,  according  to  the 
relation  "more  informative  than"  defined  in  Section  3  above. 

There  is  no  difficulty  in  recognizing  the  appropriate 
evidential  Interpretations  of  outcomes  of  the  extreme  cases  in 
this  class  of  experiments*   The  completely  informative  experiment 


1+1 

(-oo,oo)  gives  outcomes  each  of  v;hich  can  naturally  be  called 
completely  informative:   the  outcome  r  =  oo  supports  the  certain 
inference  that  H^  is  false  and  H^  is  true.  An  alternative  inter- 
pretation, which  is  equivalent  for  all  purposes  of  application,  is: 
the  inference  that  Hp  is  true  is  practically  certain,  in  the  highest 
possible  degree.   Similarly,  the  outcome  r  =  -oo  supports  the 
certain  inference  that  H,  is  true.   The  uninf ormative  experiment 
(0,0)  gives  outcomes  each  of  which  can  naturally  be  called 
(completely)  i^ninformative :   an  outcome  r  =  0  has  no  relevance  to 
the  hypotheses,  and  therefore  gives  no  support  in  any  degree  to  any 
inferences  concerning  the  hypotheses. 

In  any  intermediate  case  {•'V2t'^2.^f   0  <  r2  <  oo,  because  the 
experiment  is  neither  uninformative  nor  completely  informative  (but 
incompletely  informative),  and  because  it  is  symmetric,  it  is 
natural  and  necessary  to  attribute  to  each  possible  outcome  a 
positive  but  limited  character  as  evidence  relevant  to  H,  or  H2;  and 
It  is  natural  and  necessary  to  attribute  to  the  positive  outcome 
the  qualitative  evidential  property  of  supporting  H^  (as  against  H, ), 
and  to  the  negative  outcome  the  property  of  supporting  H, ,   In 
addition  to  intrinsic  plausibility,  these  qualitative  evidential 
properties  attributed  to  the  possible  numerical  values  of  r,  r^ 
or  -rpj  have  the  concrete  objective  probabilistic  interpretation- 
and  justification  that,  under  each  hypothesis,  the  probability  that 
such  an  interpretation  will  be  qualitatively  inappropriate  (the 
probabilities  of  a  "false  positives"  (Type  I  error)  and  of  a  "false 
negative"  (Type  II  error),  in  the  obvious  simplest  testing  or 
two-decision  procedure)  is  equal  to 


k.2 


a  =  a[r2]  =  l/(l  +  e  '')  <  I  , 


If  0  <  Tp  <  Tp  <  oo,  we  Interpret  the  positive  outcome  of  the 
experiment  (-rl^rp)  as  supporting  Hp  more  strongly  than  the 
positive  outcome  of  the  experiment  (-rp,rp)9   In  addition  to  its 
intrinsic  plausibility,  this  interpretation  is  supported  by  the 
consideration  that  outcomes  statistically  equivalent  to  those  of 
the  latter  experiment  can  be  generated  by  modifying  outcomes  from 
the  former  experiment  by  the  "addition  of  pure  noise"  unrelated  to 
the  hypotheses,  in  the  sense  of  Section  3  above.   And  this  ordering 
of  outcomes  r  of  various  symmetric  simple  binary  experiments,  which 
attributes  to  larger  values  of  [rj  greater  strength  as  evidence 
relevant  to  H,  or  Hp,  has  the  probabilistic  interpretation  and 
justification  that  a[rp]  <  ci[rp],  since  the  function  a[rp]  de- 
creases from  ^  to  0  as  rp  increases  from  0  to  oo. 

In  STimmary,  over  the  class  of  symmetric  simple  binary  experi- 
ments, the  absolute  value  lr|  of  any  outcome  r  gives  a  simple  order- 
ing of  such  outcomes  as  to  the  strength  of  evidence  they  contain; 
the  algebraic  sign  of  any  outcome  r  gives  the  direction  in  which 
such  evidence  points,  as  favoring  H,  or  favoring  Hp,  with  r  =  0 
uninformatlve  and  irrelevant;  and  each  such  interpretation  admits 
concrete  objective  probabilistic  interpretations  and  justifications 
in  terms  of  probabilities  of  errors.  V/lthin  this  restricted  class 
of  experiments,  we  may  say  that  the  mathematically  defined  function 
r  =  log  [f 2(x)/f,  (x) ]  has  been  given  an  unequivocal  and  consistent 
set  of  Interpretations,  and  that  the  interpreted  mathematical  function 
r  =  r(x)  is  an  objective,  internally-consistent  and  efficient  indi- 
cator of  evidence  relevant  to  hypotheses  in  experimental  outcomes. 


12,   Syimnetric  binary  expoi''iments»  A  binary  experiment  E,  not 
necessarily  simple,  will  be  called  symmetric  if  its  canonical  form 
v(u)  is  symmetric  about  the  line  u  +  v  =  1;  that  is,  if  for  each 
u,  0  s  u  ^  1,  we  have  v(l  -  v(u))  =  1  -  Uo  For  any  such  experi- 
ment, the  method  of  the  proof  of  the  decomposition  theorem  of 
Section  5  above  gives  a  mixture  experiment,  equivalent  to  E,  each 
of  whose  simple  components  has  the  symmetric  form  (-.rp^rp);  as  in 
Example  2  of  Section  l\.   above,  any  such  mixture  can  be  represented 
by  a  (generalized)  c,d,f,  G(ro)  on  the  range  0  ^  rp  ^  oo,s  For  any 
given  symmetric  binary  experiment  E,  let  Ep  denote  this  equivalent 
mixtiore  experiment* 

Since  E^  and  E  are  mathematically  equivalent,  in  particular 
for  purposes  of  informative  inference,  and  related  questions  of 
evidential  interpretations  of  outcomes,  we  can  consider  any  outcome 
r  of  E  as  if  it  were  a  mathematically-corresponding  outcome  of  E_., 
Each  outcome  of  this  symmetric  mixture  experiment  E«  has  the  form 
(-r2*^2'^^*  where  r  =  rp  or  -^p*      Since  rp  is  the  observed  value 
of  a  random  variable  having  under  each  hypothesis  the  same  known 
distribution  G(rp),  the  observed  value  r^  is  irrelevant  as  evidence 
concerning  the  hypotheses.   The  observed  value  rp  determines  the 
symmetric  simple  binary  experiment  (-rp,r2)  which  is  performed; 
hence  t^2  ~    ^^^    indicates,  as  in  the  preceding  Section,  just  the 
strength  of  the  evidence  x%''hich  is  provided  by  the  outcome  r  of 
the  experiment  [ -v 2i'^ 2."^  •      It  is  possible  to  Interpret  the  outcome  r 
of  the  latter  experiment  in  the  way  established  in  the  preceding 
Section  for  outcomes  of  symmetric  simple  binary  experiraents;  in 
fact,  for  purposes  of  informative  inference,  it  is  necessary  that 


•'X  -  r-  -'-t^--  ' 


ij-ji  ^  jl. 


kk 

the  outcome  r  of  E-,  be  so  interpreted,  since  the  appropriate  frame 
of  reference  for  considering  the  evidential  character  of  r  is 
clearly  the  selected  simple  experiment,  and  the  structiore  of  E^  is 
otherwise  clearly  irrelevant  to  such  interpretations o 

Because  of  the  equivalence  of  E   and  E^,  and  the  related 
equivalence  between  outcomes  of  the  two  respective  experiments 
having  n\iraerically  equal  values  r,  we  obtain  from  the  preceding 
paragraph  the  following  general  conclusions:   Given  any  sYmmetriq 
binary  experiment  E,  for  purposes  of  informative  inference,  any 
outcome  r  of  E  must  be  interpreted  evidentially  in  the  same  way  as 
a  numerically-equal  outcome  of  a  symraetric  simple  binary  experi- 
ment.  In  particular,  given  r,  the  mathematical  form  of  E  is 
irrelevant  for  such  purposes  and  interpretations <, 

To  illustrate  this  conclusion  in  concrete  terms,  a  physical 
interpretation  of  Example  1  of  Section  l\.   above  may  be  useful. 
Suppose  that  four  measurement  instruments  (or  techniques  of 
observation)  are  available  in  an  investigation  concerning  two 
hypotheses,  with  each  instrument  giving  dichotomous  outcomes, 
"positive"  or  "negative,"  and  each  instruraent  symmetric  in  the 
sense  that  it  has  equal  probabilities  a  of  falso  positives  and  of 
false  negatives.   Let  the  simple  experiments  E  ,E-,Ep  defined  in 
Example  1  represent  respectively  three  of  these  instruments,  when 
each  is  used  without  replication  (to  obtain  a  single  observation). 
Let  the  fourth  instrument  have  a  =  ,2,  and  let  E  denote  the  experi- 
ment consisting  of  four  independent  measurements  by  this  instrument; 
then  E  is  the  binomial  experiment  of  Example  Ic 


45 

Let  Er,   denote  an  experimental  procedure  in  which  one  of  the 
first  three  instruments  is  selected  at  random,  with  the  respective 
probabilities  g.  given  in  the  Example,  and  in  which  the  instrument 
selected  is  used  to  obtain  a  single  measurement.  With  this 
procedure,  if  the  worthless  instrument  E  happens  to  be  the  one 
selected,  one  may  fairly  plead  victimization  by  rather  improbable 
bad  luck, and  indeed  one  had  good  reason  to  hope  for  and  expect 
selection  of  a  more  informative  instrument;  however  these  con- 
siderations are  irrelevant  to  the  problem  of  making  inferences, 
from  a  measvirement  provided  by  S  to  the  hypotheses;  for  this 
problem,  the  only  relevant  considerations  are  that  the  instrument 
and  its  measurements  are  strictly  worthless,  and  that  this  outcome 
of  the  experiment  E^  provides,  recognizably,  no  contribution  what- 
soever to  the  inference  problem. 

In  term.s  of  the  binomial  experiment  E,  the  outcome  x  =  2 

corresponds  (iinder  the  mathematical  equivalence  of  E  with  Ep)  to 

the  selection  of  E  (and  occurrence  of  either  of  its  outcomes)  in 

o 

Eq«  Hence  there  is  no  reason  to  give  the  binomial  outcome,  x  =  2, 
Interpretations  differing  in  any  respect  from  the  interpretations 
just  described  for  the  outcome  E  of  E„,   Nor  is  there  any  reason 
to  consider  any  other  aspect  of  the  binomial  model  of  the 
experiment  E,  for  purposes  of  informative  inference,  given  that 
r(x)  =  r(2)  =  0,  a  recognizably  (completely)  uninf ormative  outcome. 

Suppose,  alternatively,  that  in  the  mixture  experiment  E^  the 
most  informative  instrument,  Ep,  is  by  good  fortune  selected. 
Granting  that  the  occurrence  of  sucji  good  luck  is  irrelevant  as 
evidence  regarding  the  hypotheses,  it  is  most  relevant  to  the 


1^6 


quality  or  strength  of  Inferences  v/hicli  may  be  made  from  a 
measurement  supplied  by  Ep,  Evidently  there  is  no  reason  to 
qualify  or  vjeaken  the  resulting  inference  statements  on  the 
ground  that  one  was  not  sure  before-hand  that  one  would  have  the 
good  luck  to  be  able  to  use  the  boat  possible  instrximent©   Suppose 
that  use  of  the  selected  instrument  Ep  gives  a  positive  outcome, 
r  =  256,   Under  the  mathematical  equivalence  between  E   and  E^, 
this  outcome  corresponds  to  the  outcome  x  =  Ij.  of  the  binomial 
experiment  E  (that  is,  to  four  positive  outcomes  in  four  indepen- 
dent measurements  by  the  instrument  having  a  =  .2),  for  which  we 
also  (necessarily)  have  r(x)  =  256.   It  follows  that  the  outcome 
X  =  ij.  of  the  binomial  experiment  E  should  be  interpreted  in 
exactly  the  same  way,  as  evidence  relevant  to  the  hypotheses,  as 
if  it  were  a  positive  outcome  obtained  in  a  single  measurement  by 
an  instrument  Ep  having  probability  a  =  •0039  of  false  positives  and 
of  false  negativeso   The  numerical  value  r  =  log  (l-a)/a  =  log  256 
serves,  by  definition,  as  a  compact  abbreviation  for  such  an 
evidential  interpretation  of  the  outcome  x  =  i;*  Analogous 
interpretations  apply  to  the  remaining  possible  outcomes  x  of  E, 

13«  Binary  experiments  in  general.   To  extend  the  scope  of  the 
preceding  evidential  interpretations  of  the  statistic  r  to  binary 
experiments  which  are  not  necessarily  symiiietric,  let  E:  v(u)  be  any 
binary  experiment.   Let  E""":  v'''(u)  be  the  "reflection"  of  v(u)  in 
the  line  u  +  v  =  Ij  that  is,  for  each  point  (u*,  v(u'))  of  the 
(continuous)  graph  of  v(u),  let  the  graph  of  v'^(u)  contain  the 
point  (u",v'--"(u"))  =  (l-v(u«),  1-u').   Let  E''"'  =  |e  +  |E""i  that  is. 


w 


E'""^'  is  the  mixture  experiment  having  3  and  E'"'"  as  components  with 
probabilities  each  -^t      Then  E"""'"'  is  a  symmetric  binary  experiment. 
If  the  experiment  E''"'"  were  under  consideration,  and  if  its 
component  E  were  selected,  then  any  outcome  r  of  E  must  be  inter- 
preted evidentially  in  the  way  described  in  the  preceding  Section, 
since  E'"~""  is  symmetric;  the  selection  of  E  is  irrelevant  here, 
given  the  numerical  value  of  ro 

Retiorning  to  consideration  of  the  given  experiment  E,  any 
outcome  r  of  E  is  equivalent,  for  purposes  of  informative  inference, 
to  an  outcome  of  the  mixture  experiment  E'-'""'"  in  which  the  component 
E  is  first  selected,  and  then  the  outcome  r  is  observede   It 
follows  that  the  evidential  interpretations  of  outcomes  r  of  an;^; 
binary  experiment  must  be  of  the  same  kinds  as  those  given  in  the 
cases  discussed  in  the  preceding  Section. 

lii.»   Inferences  based  on  the  likelihood  f\Hiction»   The  results 
of  the  preceding  analysis  may  be  siJinmarized  as  follows:  VJhen  any 
binary  experiment  E  is  used  for  purposes  of  informative  inference, 
and  when  any  specified  outcome  r  of  E  is  obtained,  the  mathe- 
matical structtire  of  E  is  then  irrelevant  to  those  purposes,  and 
just  the  numerical  value  r  is  relevant*  Any  such  observed  numer- 
ical value  r  has  an  intrinsic  objective  probabilistic  character  as 
evidence  relevant  to  H-  or  Hpj  namely:  (a)  the  qualitative  property 
that  the  outcome  favors  Hp  if  r  is  positive,  favors  H,  if  r  is 
negative,  and  is  irrelevant  as  evidence  if  r  =  0;  and  (b)  strength, 
as  evidence,  identical  with  that  of  a  single  outcome  of  the  sym- 
metric simple  binary  experiment  (-r2,r2),  vjhere  rp  =  |r|o 


The  latter  simple  experiment  has  probabilities  of  false  positives 
and  of  false  negatives  each  equal  to  a  =  a[|r|]  =  l/(l  +  e   ' )  ^  ^« 

In  terms  of  any  observed  outcome  x  of  E,  and  the  likelihood 
function  [f,(x),  f2(x)],  we  have  for  this  index  of  strength  of 
evidence  in  x:   a  =  mln^f^(x)/[f2(x)  +  f2(x)];  and  for  the 
direction  of  this  evidence,  positive  (favoring  H2  over  H^)  if 
f2(x)  >  fnCx),  negative  (favoring  H^)  if  fj_(x)  >  f2(x),  and  neutral 
if  f, (x)  =  f2(x).   In  terms  of  the  likelihood  ratio  statistic 
L(x)  =  f2(x)/f,(x)  =  log  r(x),  we  have,  for  example,  that  numerical 
values  L(x)  =  19  or  l/l9  give  a  =  l/20  =  .05;  L(x)  =  99  or  l/99 
gives  a  =  eOlj  thus  evidence  which  is  at  least  moderately  strong. 
In  the  sense  of  corresponding  to  small  values  of  a,  is  represented 
by  similarly  small  values  of  the  likelihood  ratio  statistic  L(x) 
or  its  reciprocal,  since  a  =  min  [L(x),  1/L(x)]  whenever  the 
latter  term  is  about  elO  or  less. 

Thus,  for  purposes  of  informative  inference,  we  may  say  that 
an  efficient  indicator  of  the  evidence  in  any  given  experimental 
outcome  x  of  any  experiment  E  relevant  to  statistical  hypotheses 
H,,H2,  is  the  likelihood  function  [f^{x),f2{y^)]t    or  more  compactly 
the  likelihood  ratio  L(x)  =  f2(x)/f2(x)  or  r(x)  =  log  L(x);  the 
likelihood  fimction  (or  L(x)  or  r(x))  is  a  mathematical  term  having 
direct,  primitive ^  intrinsic,  objective  evidential  meaning;  this 
meaning  can  be  interpreted  in  terms  of  directly  associated  error- 
probabilities,  although  the  latter  probabilities  are  in  general 
not  defined  with  reference  to  the  structure  of  the  experiment  E 
from  which  the  outcome  x  was  obtained. 


nc!.' 


k9 


If  any  evidential  Interpretations  of  observed  values  of  r(x) 
are  regarded  within  the  frame  of  reference  of  the  specific  binary 
experiment  E  from  which  x  is  obtained,  then  we  have  formally  a 
particular  case  of  the  procedures  discussed  In  Section  9  above.  In 
which  the  various  evidential  interpretations  of  outcomes  r(x)  can 
be  given  probabilistic  interpretations  and  justifications  in  terms 
of  error-probabilities  defined  with  reference  to  the  form  of  E, 
However,  such  evidential  interpretations  of  outcomes  r(x),  despite 
their  objective  aspects,  are  in  general  deficient  for  purposes  of 
informative  inference  to  the  extent  that  they  differ  from  the 
evidential  interpretations  of  the  likelihood  fvinction  described  in 
the  preceding  paragraphs© ■ 


50 

l5»  Appraisal  and  design  of  exnerlments  for  Informative 
inference. 

Granting  that  the  structure  of  a  binary  experiment  E  is 
Irrelevant  to  the  problem  of  evidential  interpretation  of  any  given 
outcome  x  of  E,  there  remain  the  important  problems  of  appraising, 
comparing,  and  designing  experiments  for  purposes  of  informative 
inference*   Here  the  structure  of  any  experiment  E  is  most  relevant, 
and  the  partial  ordering  of  binary  experiments  by  the  relationship 
"more  informative  than,"  discussed  in  Section  6  above,  is  fundamental, 

Beyond  the  basic  consideration  that,  when  other  things  are 
equal,  more  informative  experiments  will  be  preferred,  we  may  con- 
sider the  value,  as  evidence,  of  each  possible  outcome  r,  that  is, 
the  evidential  value  of  any  numerical  value  r  obtained  as  an  outcome 
of  any  binary  experiment  E© 

The  analysis  of  the  preceding  Sections  shows  that  an  \ininform- 
ative  outcome  r  =  0  of  any  binary  experiment  is  recognizably 
completely  devoid  of  value  as  evidence  relevant  to  the  hypotheses. 

Each  of  the  completely  informative  outcomes,  r  =  -oo  or  oo,  of 
any  binary  experiment  has  the  highest  possible  value  as  evidence  in 
the  sense  that  such  an  outcome  supports  a  certain  Inference  in 
favor  of  one  of  the  hypotheses.   On  the  other  hand,  despite  the 
fact  these  outcomes  have  the  same  evidential  strength,  |r|i  =  oo,  it 
is  in  general  not  appropriate  to  attribute  to  these  two  possible 
outcomes  the  same  evidential  value,  since  the  significance  and 
consequences  of  the  respective  hypotheses  may  be  of  quite  different 
kinds  and  not  comparable o   Consequently  it  is  appropriate  to 
attribute  to  the  outcome  r  =  -co  the  maximum  possible  value  among 
all  possible  negative  outcomes  (r  <  0,  supporting  H, ) ,  and  to 


rJ3 


51 

attribute  to  the  outcome  r  =  oo  the  maximum  possible  value  among 
all  possible  outcomes  (r  >  0,  supporting  Hp)j  and  to  make  no 
comparisons  between  the  possible  outcomes  r  =  -oo  and  r  =  oo  as  to 
their  evidential  values,  notwithstanding  the  fact  that  these 
outcomes  have  the  same  evidential  strength. 

Any  two  possible  positive  outcomes,  0  <  r'  <  r"  ^  oo,  share 
the  qualitative  evidential  property  of  supporting  Hp  against  H, ,  and 
the  outcome  r"  supports  Hp  more  strongly  than  does  r';  hence  we 
attribute  to  the  outcome  r"  greater  evidential  value  than  the 
out  c  ome  r ' • 

(The  consideration  that  IS   H,  is  true  the  inferences  indicated 
by  r"  will  be  more  erroneous,  and  hence  have  less  value,  than  those 
based  on  r*,  has  already  been  taken  into  accoiint;  in  the  preceding 
sections,  in  which  the  evidential  interpretations  found  for  each 
possible  value  of  r  were  based  directly  on  an  appropriate  con- 
sideration of  the  fact  that  each  hypotheses  was  considered  initially 
to  be  possibly  true.   To  illustrate  this  point  further,  any  outcome 
r"  >  0  of  any  binary  experiment  E  can  be  Interpreted  evidentially 
as  if  it  were  an  outcome  of  the  simple  experiment  (-r",r")«  As  in 
Example  3  of  Section  2  above,  the  simple  binary  channel  (-r",r")  can 
be  degraded,  by  adding  in  series  after  (-r",r")  a  suitable  noisy 
channel  with  noise  affecting  only  the  negative  signal  and  consequently 
degrading  only  the  positive  signal,  to  give  an  overall  channel 
(-r",r'),  0  <  r»  <  r".   The  positive  outcome  r"  of  {-r",r")  is  then 
alwayti  followed  by  the  positive  outcome  r'  of  (-r",r'),  but  the 
latter  outcome  is  possible  also  as  the  result  of  noise  affecting  the 
negative  outcome  -r"  of  (-r",r").   Hence  the  relation  r"  >  r »  >  0 
corresponds  to  the  larger  evidential  value  of  r"  than  of  r'a) 


'c  i^or. 


■t:  =  : 


r,  XJ'.....' '. 


:!     rAi 


52 

Similarly,  if  -oo  g  r"  <  r '  <  0,  the  outcome  r"  has  greater 
evidential  value  than  the  outcome  r'«   If  r'  <  0  <  r",  in  general 
no  comparisons  as  to  evidential  values  are  appropriate,  for  the 
reasons  given  above  in  the  case  r'  =  -oo,  r"  =  co« 

Since  the  evidential  meaning  and  value  of  any  numerical  value 
r(x),  for  any  outcome  x  of  any  binary  experiment  E^  are  primitive 
(although  objective)  notions,  it  seems  not  appropriate  for  the  most 
general  purposes  of  informative  inference  to  impose  upon  such 
numerical  values,  nor  upon  the  probability  distributions  of  r(X) 
vinder  IL  and  Hp  in  E,  any  numerical  measures  of  evidence  or  inform- 
ation or  evidential  value,  or  averages  (expected  values)  of  amounts 
or  utilities  of  information*   Tliat  is,  for  such  purposes,  it  seems 
that  any  numerical  measure  or  index  intended  to  represent  the  amoiint 
of  information  in  a  specified  outcome  r(x),  (or  in  a  specified 
outcome  r(x)  given,  e.g»,  that  Hp  is  true),  or  intended  to  represent 
the  average  (in  some  sense)  amount  or  valxie  of  information  provided 
by  E  (possibly  given,  e.go,  that  E^   is  true),  must  have  limited 
value,  and  will  fail  to  represent  adequately  all  of  the  relevant 
evidential  properties  (which  can  be  recognized  and  appraised  in 
their  own  primitive  terms)  of  outcomes  r(x)  and  their  probability 
distributions  under  H-j^  and  E^   in  any  specified  statistical  experi- 
ment.  Hence  no  simple  ordering  of  binary  experiments  seems  adequate 
or  appropriate  for  such  purposes  in  general.   On  the  other  hand,  the 
partial  ordering  of  such  experiments,  by  the  relation  "more  in- 
formative than",  is  not  only  useful,  but  seems  appropriate  and 
adequate  for  the  general  purposes  of  appraisal,  comparison  and  de- 
sign of  experiments  for  typical  purposes  of  informative  inference. 


:J:^C»3' 


n'-'S' 


iSO'iii 


0 

.1 


XJS 


3 


53 

if  it  is  used  in  conjxxaction  with  direct  appraisal  of  specific 
experiments  considered  for  possible  use,   (This  partial  ordering  of 
the  class  of  binary  experiments  i-Jas   defined  in  Section  6  above  in 
terms  of  the  mathematical  forms  (v(u))  of  binary  experiments  without 
reference  to  the  several  possible  types  of  applications  of  such 
experiments*   For  the  applications  to  decision-problems  of  each  of 
the  kinds  described  in  Sections  8  and  9,  the  partial  ordering  of 
binary  experiments  with  reference  to  the  sets  of  relevant  error- 
probabilities  coincides  in  each  case  with  the  more  basic  partial 
ordering.   Here,  for  applications  of  binary  experiments  for  purposes 
of  informative  inference,  we  find  once  more  that  the  same  partial 
ordering  is  appropriate o ) 

To  illustrate  that  any  simple  ordering  of  binary  experiments 
might  be  inadequate  for  some  purposes  of  informative  inference,  and 
more  generally  to  illustrate  the  nature  of  direct  appraisals  of 
experiments  for  purposes  of  informative  inference,  consider  the 
experiment  E  defined  by  v(u)  =0,  0  g  u  ^  *Sf   v(u)  =  u  -  «.5, 
•5  S  u  <  I5  and  the  (non-comparable)  symmetric  simple  experiment  E» 
defined  by  v(u)  =  (l/99)u,  0  g  u  ^  .99,  v(u)  =  ,01  +  99  u, 
c99  S  u  g  1«   (Et  may  be  represented  alternatively  by 
(r^,r2)  =  (-  log  99,  log  99);  E»  allows  a  two-decision  rule  with 
error-probabilities  each  equal  to  oOl,)   Under  Hp,  E  provides  with 
probability  o5  a  certain  inference  supporting  Hp,  and  with 
probability  ,5  an  outcome  which  is  recognizably  completely  xminform- 
atives   Under  Hp,  E'  provides  always  an  outcome  r  having  evidential 
strength  |r|  =  log  99  (associated  with  error-probabilities  aOl),   It 
seems  clear  that  no  simple  ordering  of  even  these  two  experiments 
will  be  adequate  and  satisfactory  for  the  general  purposes  of  inform- 


J I  'I 


1  ftw  , 


■.r.'^o 


ative  inference.   On  the  other  hand,  in  any  specific  situation  of 
informative  Inference,  direct  consideration  of  aspects  (b)  and  (c) 
described  in  Section  7  above  will  provide  an  appropriate  basis  for 
choice  among  two  such  experimental  designs:  in  some  contexts, 
evidential  strength  |r|'  =  log  99  (and  associated  error-probabilities 
a  =  eOl)  may  be  regarded  as  of  slight  value  as  compared  with  even  a 
chance  of  obtaining  an  outcome  which  vjill  support  a  certain  inference; 
in  other  contexts,  certainty  of  an  outcome  with  the  evidential 
strength  guaranteed  by  E'  may  be  strongly  preferred  to  an  appreciable 
risk  of  obtaining  a  completely  \aninf ormative  outcome. 

These  remarks  do  not  detract  from  the  usefulness  of  measures 
or  indices  of  information  in  outcomes  or  experiments  for  specific 
purposes,  including  approximate  descriptions  and  comparisons  which 
they  may  provide  in  forms  which  are  particularly  useful  for  experi- 
ments based  on  sufficiently  large  numbers  of  independent  observations. 

The  basis  of  such  appraisals  and  comparisons  of  experiments  is 
the  representation  of  each  experiment  by  its  canonical  form  v(u), 
or  equivalently  by  its  "error-probability"  curve  p(a).   Such  ciorves 
(and  their  analogues  in  more  complicated  experiments  and  inference 
problems,  including  testing  hypotheses,  estimation,  and  multi- 
decision  problems)  constitute  a  central  basic  part  of  the  modern 
theory  of  mathematical  statistics,  although  preceding  Sections 
showed  that  the  structure  of  experiments  is  irrelevant  for  purposes 
of  making  inforraative  inferences  from  a  given  outcome  of  a  specified 
experiment,  once  the  likelihood  function  of  an  outcome  is  given, 
the  preceding  paragraphs  show  that  the  part  of  mathematical  stat- 
istics referred  to,  when  suitably  interpreted,  constitutes  a  com- 


.M-,         A. 


^  -  .-   -.^t.-     ^ 


'rr    .^--'•■'^.•^ 


^5 

prehensive  theory  which  is  appropriate  and  directly  useful  in  the 
important  problems  of  appraisal,  comparison,  and  design  of  experi- 
ments for  purposes  of  informative  inference « 

Example «  A  problem  of  designing  a  binary  experiment  for 
purposes  of  informative  inference^ which  has  a  simple  specification 
and  solution, is  the  folloviing:   Suppose  that  a  teclinique  of 
observation  is  available  which  gives  outcomes  Y  having,  under  H^ 
and  Hp,  the  respective  elementary  probability  functions  g-,(y),  g2(7)» 
Suppose  that  the  choice  of  a  sample  size,  or  of  a  sequential  sampling 
rule,  is  at  the  disposal  of  the  statistician,  and  that  any  number  of 
statistically  Independent  observations  may  be  taken.   It  is  required 
to  find  an  experimental  design  (sampling  rule)  such  that  the 
evidential  strength  and  value  of  each  possible  outcome  x  will  be 
such  as  to  satisfy  the  bounds  L(x)  £  L^  or  L(x)  >  Lp,  where  L, ,  Lp 
are  given  numbers  satisfying  L-  <  0  <  Lpo   (For  example,  if 
Lp  =  -L,  =  99,  this  requirement  has  the  interpretation  that  each 
possible  outcome  must  have  at  least  the  evidential  strength 
associated  intrinsically  as  above  with  error  probabilities  equal 
to  .01,)   Subject  to  this  requirement,  it  is  further  required  that 
the  experimental  design  be  efficient  in  the  sense  of  minimizing  the 
required  number  of  observations  on  Y,  In  some  appropriate  sense 
(which  may  be  probabilistic,  and  whose  specification  remains  to  be 
made  precise). 

Each  possible  definition  of  a  sampling  rule,  sequential  or 
non-sequential,  determines  a  complete  definition  of  a  binary 
experiment  E,  i^/ith  sample  space  S  =  -Ixf  ,  where  the  generic  outcome  x 
has  the  form  x  =  (y-,e..  y„)*  with  n  =  n(x)  not  constant  in  general; 
and  where  the  outcome  X  has,  under  H,  and  Hp  respectively,  the 


0-- 


'lo  •'v'jii'-iS.m 


56 
elementary  probability  functions 

f^(x)  =  c(x)g^(y^)g^(y2)o<iog^(y^),   1  =  1,  2, 

where  c(x)  is  a  function  determined  by  the  sampling  rule.   Letting 

n(x) 
r(y.)  =  log  [g2(yi^)/gn  (7^)3,  we  have,  for  each  x,  r(x)  =  zZl  ^^7^)* 

Letting  b  =  log  L,  and  a  =  log  Lp,  the  above  inequalities  become: 
r(x)  ^  b  or  r(x)  ^  a,  for  each  possible  outcome  x  of  E,   In  rather 
special  cases,  such  as  binomial  experiments  for  which  a  single 
observation  on  Y  is  sufficiently  informative,  we  have  that  for  each 
possible  value  y  of  Y,  either  r(y)  ^  b  or  r(y)  >  a;  in  such  special 
cases,  the  experiment  E  consisting  of  a  single  observation  on  Y  is 
clearly  the  most  efficient  possible  design,  in  each  sense  in  which 
efficiency  may  be  defined  formally.   On  the  other  hand,  if  the  event 
b  <  r(Y)  <  a  has  positive  probability  under  at  least  one  hypothesis, 
then  it  is  necessary  to  adopt  a  sampling  rule  which  allows  in 
general  more  than  one  observation;  furthermore,  in  general  (for 
example  in  problems  concerning  means  of  normal  distributions)  any 
rule  specifying  a  fixed  number  of  observations,  however  large,  will 
lead  to  positive  (although  possibly  small)  probabilities  for  the 
event  b  <  r(X)  <  a<,   It  follows  that  in  typical  cases  it  is  neces- 
sary to  use  a  sequential  sampling  rule  to  satisfy  the  requirement 
that  r(x)  <  b  or  r(x)  >  a  for  each  possible  outcome  Xo 

For  any  sampling  rule  which  leads  only  to  outcomes  satisfying 
these  Inequalities,  each  outcome  can  be  represented  in  the  form 
X  =  (y^,...  y  j..»«  Yjj)*  where  1  g  n'  g  n  and 


n'  =  n'  (x)  =  mln  -Ix 


m 


m 


1^       1    - 


a 


57 

The  modified  sampling  rule,  which  terminates  observation  when  y  ,  is 
observed,  for  each  x,  never  requires  more  observations  and  in 
general  requires  fewer  observations  on  Y  than  the  given  rule;  hence 
the  modified  rvle   is  at  least  as  efficient,  according  to  each 
possible  definition  of  efficiency,  and  in  general  more  efficient. 
The  only  sampling  rule  which  cannot  be  strictly  improved  by  such 
modification,  and  which  consequently  gives  the  most  efficient 
solution  of  our  problem  of  experimental  design  for  informative  in- 
ference, is  the  sampling  rule  given  by  V/ald  (for  the  different 
problem  of  testing  sequentially  between  two  simple  hypotheses)? 
Take  observations  Y-,,Jp,'3ea,   until  for  the  first  time  the  inequal- 
ity b  <  )    r(yj )  <  a  is  violated,   (The  elementary  nature  of  the 

i     ^ 
preceding  determination  of  this  sequential  sampling  rvile  as  most 

efficient  for  the  informative  inference  problem  contrasts  sharply 
with  the  difficult  demonstration  of  its  optimality  for  the  testing 
problem^ ) 

In  most  actual  inference  situations,  sequential  sampling  rules 
allowing  unbounded  numbers  of  observations  and  stages  of  sampling 
are  impracticable©  For  such  situations,  the  above  definition  of 
efficiency  in  terms  of  just  the  number  of  observations  required  is 
inappropriate,   and  a  more  appropriate  formulation  of  the  general 
goal  of  efficiency  includes  some  bound  on  the  number  of  stages  of 
sequential  sampling  allowed,  and  perliaps  a  bound  on  the  total 
number  of  observations.   On  the  other  hand,  as  indicated  above,  any 
sampling  rule  with  an  upper  bo\ind  on  the  number  of  observations  will 
in  general  give  b  <  r(X)  <  a  with  positive  probability  under  at 
least  one  hypothesis.   In  this  sense,  the  simple  formulation  given 


rfi  ^■'iA■!.  ■  .     o   aocj£?n. 


k    QHjr  . 


t.i:. 


a  '..iv'': 


.  ..,:-ii  :■; 


58 

above  of  the  general  goal  that  a  binary  experiment  be  sufficiently 
informative,  is  to  a  degree  incompatible  vjith  the  limited  scope  of 
practicable,  planned  experimental  procedures  (although  not  in- 
compatible vjith  a  broad  view  of  research  investigations  as  typically 
open-ended).   This  difficulty  cannot  be  altogether  circ\amvented 
within  the  limited  scope  of  planned  experiments,  but  it  can  be  dealt 
with  usefully  by  adopting  a  sampling  rule,  possibly  non-sequential, 
which  reduces  to  suitably  small  values  the  probabilities  of 

the  insufficiently  informative  outcomes  b  <  r{X)  <  ao   (By 
taking  a  single  sample  containing  a  sufficiently  large  number  N  of 
observations,  obtaining  outcomes  x  =  (y, ,«•«  Ym)*  ^^^   by  using  an 
auxiliary  randomization  device  to  transform  outcomes  x  into  outcomes 
z,  it  is  possible  to  satisfy  exactly  the  above  inequalities  in  the 
sense  that  for  each  possible  outcome  z  we  have  r(z)  ^  b  or  >  a^ 
while  such  devices  have  some  use  for  decision  problems,  such  methods 
seem  artificial  and  inappropriate  for  the  basic  general  purposes  of 
informative  inference  since  they  discard  some  available  information; 
it  seems  more  appropriate  for  purposes  of  informative  inference  to 
report  r(x)  based  upon  the  complete  available  outcome  x  in  any  given 
situation, ) 

16,   Relations  between  statistical  evidence  and  significance 
testsa   Let  E  be  any  informative  binary  experiment, 
v(u)  ji   u,  and  for  some  a,  0  <  a  <  a»,  let  d  (u)  be  the  best  test  of 
level  a  as  defined  in  Section  8  above.   Then  as  above  this  test  has 
P  =  1  -  v(l-a),  0  <  p  <  1,   If  outcomes  of  E  are  reported  only  in 
the  form,  either  dpi  "reject  H,";  or  d,  :  "do  not  reject  H, "  (or 
"accept  H,"),  then  this  significance  test  procedure  is  equivalent  to 


59 

the  simple  binary  experiment  E»  in  i^jhich  the  likelihood  ratio 
statistic  L  has  only  the  two  possible  values  L^  =  p/(l-a)  <  1 
(for  d,  )  or  L_  =  (l-P)/a  >  1  (for  dp).   Hence  the  outcome  "reject 
H^ "  has  strength,  as  evidence,  corresponding  to  the  value  Lp  of  the 
likelihood  ratio  statistic,  and  is  associated  intrinsically  in  the 
sense  of  Section  li|  above  with  the  error-probability 
a'"'  =  l/d+L^)  =  a/(l-p+a). 

If  the  ratio  Lp  of  the  test's  power  (1  -  P)  to  its  level  a  is 
not  far  above  unity,  then  a'-'"  is  not  far  below  ,5>  and  the  evidential 
strength  of  the  outcome  "reject  H-"  is  correspondingly  slight;  this 
can  be  the  case,  within  wide  limits,  for  any  value  of  a,  including 
very  small  values.   Thus  in  binary  experiments  a  small  value  of  a 
does  not  in  general  imply  hip:h  evidential  strength  in  the  outcome 
"reject  H, ">  and  the  determination  of  the  evidential  strength  of 
such  an  outcome  depends  upon  p  as  well  as  a,  tlxrough  the  fu-nction 
Lp  =  (l-P)/a,   (Within  a  specified  binary  experiment,  if  a  is 
decreased,  then  Lp  is  increased,  at  least  if  v(u)  is  strictly  convex^ 
however  the  upper  limit  approached  by  Lp  as  a  decreases  may  or  may 
not  be  far  above  unity, ) 

On  the  other  hand,  if  P  is  appreciably  below  o^*  then  small 
values  of  a  correspond  to  similarly  small  values  of  a'"',  the  error- 
probability  intrinsically  associated  with  the  outcome  "reject  H-," 
For  example,  p  <  ,25  implies  a/(l+a)  <  .a'--'  <  {1^/3) <^i    if  a,  is  also 
small,  such  inequalities  Imply  that  a'""  -  a.  That  is,  if  both  a  and  p 
are  small  t  then  the  error-probability  a"""  corresponding  to  the  intrln- 
sic  evidential  strength  of  the  outcome  "reject  H,  "  is  approximately 
equal  to  ttn 

Parallel  remarks  apply  to  evidential  interpretations  of  the 
outcome  "accept  H,", 


60 

While  the  preceding  considerations  clarify,  and  In  important 
cases  support,  certain  qualitative  and  quantitative  features  of 
customary  uses  and  interpretations  of  significance  tests  as 
techniques  for  informative  inference,  they  do  not  completely  support 
the  method  of  significance  tests  as  such  for  purposes  of  informative 
inference.   For  such  purposes,  the  methods  based  directly  on  the 
likelihood  function,  described  in  Section  li|.  above,  are  in 
principle  to  be  preferred,  for  the  reasons  given  there. 


61 

D«    Di3Cusslon» 

17 •   Relations  of  statistical  evidence  to  prior  Information 

and  to  concluslonso   The  preceding  Sections  10  -  16  have 
dealt  with  a  single  aspect  of  situations  of  informative  inference: 
the  nature  and  properties  of  experimental  outcomes  as  evidence 
relevant  to  statistical  hypotheses*   If  each  statistical  hypothesis 
represented  in  a  binary  experiment  is  regarded  initially  as 
possibly  true,  then  in  many  situations  evidence  against  one 
hypothesis,  if  sufficiently  strong,  would  support  a  conclusion  that 
that  hypothesis  is  false.  The  general  nature  of  conclusions  in 
various  contexts  of  investigation,  their  uses,  limitations,  and 
possible  ultimate  reversibility,  are  familiar  (cfo  Tiokey,  [l])« 
These  features  of  conclusions,  and  the  strength  of  statistical 
evidence  which  would  suffice  in  any  given  situation  to  support  a 
conclusion,  are  among  the  aspects  of  inference  situations  (like  (b) 
and  (c)  of  Section  10  above)  whose  formal  specification  is 
problematical.  But  the  process  by  which  informal  consideration  of 
the  various  aspects  of  inference  situations,  including  experimental 
outcomes,  som.etlmes  leads  to  conclusions,  is  familiar^  and  the 
formal  and  objective  evidential  properties  of  experimental  outcomes, 
analyzed  above,  are  conveniently  assimilable  In  this  process. 

One  aspect  of  an  Inference  situation  whose  formal  specification 
is  often  problematical  la  that  of  prior  opinions  or  information, 
including  relevant  previous  experience,  indirect  evidence,  and 
general  theoretical  considerations*  Bayesian  treatments  of  infer- 
ence problems,  in  which  such  considerations  are  represented  by 
a  priori  probabilities  (in  some  sense)  of  the  statistical  hypotheses 


vx 


.&L'f. 


62 

considered,  x^rill  not  be  discussed  here,  except  to  note  that  they 
coincide  with  the  informal  process  referred  to  in  the  preceding 
paragraph  in  taking  just  the  likelihood  function  as  the  appropriate 
indicator  of  evidence  in  outcomes  relevant  to  the  hypotheses,  and 
that  they  differ  only  in  their  degree  and  mode  of  formalization  of 
this  aspect  of  an  inference  situation. 

18,  References  and  acknowledgments.  An  Inclusive  annotated 
bibliography  on  the  foundations  of  statistics  had  been  given  by 
Savage  ([2],  Appendix  3),   The  definitions,  developments,  and 
Interpretations  given  above  have  many  points  in  common  v;lth  other 
writings.   The  model  of  a  statistical  experiment,  in  which  each 
statistical  hypothesis  considered  possible  is  represented  explicitly,- 
Is  basic  in  the  theories  of  Neyman  and  Pearson  and  of  V/aldo   The 
mathematical  form  of  the  partial  ordering  of  statistical  experi- 
ments is  part  of  the  theory  of  comparison  of  experiments  (cf, 
Blackwell,  reference  [B  l6]  in  [2])^   the  style  of  derivation  in 
Part  A  above  illustrates  its  intrinsic  relation  to  the  mathematical 
definition  of  a  statistical  experiment,  independent  of  specific 
formulations  of  inference  or  decision  problems  (cf,  also  Shannon  [3]), 
The  development  of  statistical  decision  theory  due  to  IJald  has 
helped  greatly  to  clarify  the  complex  relations  and  distinctions 
between  the  problems  and  purposes  of  informative  Inference  and 
decision-making.   An  example  given  by  Cox  [[|]  illustrated  the 
usefulness  of  mixtures  of  experiments  for  analysis  of  problems  In 
the  foxondations  of  statistical  inference.   The  basic  status  and 
role  of  the  likelihood  function  in  informative  Inference  was  pointed 
out  by  Fisher  and  by  Barnard  [51 >  however  the  above  method  of 


63 

analysis  and  interpretation  of  the  status  and  properties  of  the 

likelihood  function  differs  considerably  from  those  of  Fisher  and 
of  Barnard.   This  status  and  role  of  the  likelihood  function 
appears  also  ijithin  the  various  Bayesian  theories  of  inference;  from 
the  non-Bayesian  standpoint  developed  above,  there  is  much  intrinsic 
interest  in  the  discussions  of  the  likelihood  function  and  its 
uses  given  by  such  writers  as  Jeffreys,  Good,  and  Savage. 


64 

Appendix.   (Supplement  to  Part  A) 

The  algebra  of  statistical  eX;  erlments*  An  algebra  of 
statistical  experiments  is  obtained  by  considering  jointly  the 
various  operations  and  relations  described  above  for  statistical 
experiments,  and  the  algebra  of  probability  distributions  of  random 
variables.   The  scope  of  this  algebra  may  be  illustrated  as 
follows: 
Notations: 

1,  E  ^  E':   E  is  at  least  as  informative  as  E'  (Section  6), 

2«  E  E»:   "series  multiplication"  of  experiments 
(Example  3>  Section  2), 

3»  y   g^E.  or  \  E  dG(E):  mixture  of  experiments,  not 

necessarily  binary  (Section  I4.  and  its  direct  generalization). 
l^.t     E  -»  E':   convolution  of  statistical  experiments,  repres- 
enting the  experiment  consisting  of  statistically  inde- 
pendent combination  of  E  and  E»»  For  example, 
E  «■  E  =  E^   'is  the  two-fold  replication  of  E;  that  is, 
the  experiment  consisting  of  two  independent  observations 
on  the  random  outcome  Y  given  by  E,  and  having  outcomes 
X  -   ^7-1  tJp^*      This  operation  corresponds  to  the  con- 
volution of  random  variables  as  follows:   If  r.  is  the 
sufficient  statistic  (defined  as  above)  of  E.,  i  =  1,2,3; 
and  if  E~  =  E^  -s-  Epj  then,  under  each  hypothesis  H^,  the 
distribution  of  F^(r,Q)  of  R-.  is  the  convolution  of  the 
distributions  P^(r,Q)  of  R^  and  P2(r,e)  of  R2: 
R^  =  R^  +  1^2*  and  P^(r,Q)  =  P^(r,Q)  ^-  F^(r,9)o 


■-•^B 


■;r..v.'oIIi 


•  < 


65 


Examples : 


1«   If  E  is  not  completely  linlnf ormative,  then  with  Increasing 

( ^"n ) 

n  we  have  convergence  of  E^    to  the  completely  informative 

experiment,  in  the  sense  of  convergence  of  distributions  of  the 
sufficient  statistic  rj  and  in  the  sense  of  convergence  of 
canonical  forms  v(u),  which  are  formally  distribution  functions. 
Vlhen  n  is  sufficiently  large,  the  central  limit  theorems  provide, 
useful  approximations  to  the  distributions  of  R  and  thereby  to  the 
form  of  v(u), 

2,   If  E^  >  E^,  then  for  any  E»  we  have  E»  -::-  E-  ^  E»  -;:■  E^. 
The  inequality  is  strict  if  and  only  if  E»  is  not  completely 
informative. 

3«   If  E,  >  Ep,  then  for  any  E'  \-ie   Imve 

|e.  +|e^  >1e.  4-|e2  . 

Various  other  identities  and  inequalities  representing  transitivity, 
additivity,  and  other  relations  are  easily  demonstrated. 


*  u'ic-  <     f-'arv-    r 


66 


References«j 
[1]   Tulcey,  John  W,   "Conclusions  vs«  decisions."  To  be  published. 
[2]   Savage,  Lo  J«   The  Fouiidations  of  Statistics «  J.  VJiley,  195i;« 
[3]      Shannon,  Claude  E,   "A  note  on  a  partial  ordering  for 

cornm\;inication  channels,"  Information  and  Control ^  v«  I  (1958 )> 

PPe  390-397. 
[[).]   Cox,  D.  R»   "Some  problems  connected  with  statistical 

inference,"  Annals  of  Mathematical  Statistics,  v«  29  (1958), 

pp.  357-372. 
[5]   Barnard,  G,  A,   "Statistical  inference,"  Journal  of  the  Royal 

Statistical  Society,  Supplement,  v.  11  (19l|-9),  pp.  115-139. 


BASIC  DISTRIBUTION  LIST  FOR  UNCLASSIFIED  TECHNICAL  REPORTS 


Address 


No,  of 

Copies  Address 


No.  of 
Copies 


Head,  Statistics  Branch    3 
Office  of  Naval  Research 
Washington  25,  D«  C, 

Commanding  Officer        2 
Office  of  Naval  Research 
Branch  Office,  Navy  100 
Fleet  Post  Office 
New  York,  New  York 

ASTIA  Docioment  Service  Cntr. 
Arlington  Hall  Sta,       10 
Arlington  12,  Virginia 

Office  of  Techn.  Services   1 
Department  of  Commerce 
Washington  25,  D.  C. 

Techn.  Informa,  Officer    6 
Naval  Research  Laboratory 
Washington  25,  D,  C. 

""Prof,  T.  W.  Anderson       1 
Dept,  of  Math,  Statistics 
Coliombia  University 
New  York  27,  New  York 

Prof.  Z,  W.  Blrnbaiom       1 
Lab,  of  Stat,  Research 
Dept.  of  Mathematics 
University  of  Washington 
Seattle  5,  Washington 

Prof,  Ralph  A,  Bradley     1 
Dept.  of  Stat,  i   Stat.  Lab. 
Virginia  Polytechnic  Inst, 
Blacksburg,  Virginia 

Prof.  Herman  Chernoff      1 
Appl.  Math,  i   Stat,  Lab. 
Stanford  University 
Stanford,  California 

Prof,  W.  G.  Cochran        1 
Dept,  of  Statistics 
Harvard  University 
Cambridge,  Massachusetts 

Prof,  Benjamin  Epstein     1 
Appl.  Math,  i   Stat.  Lab. 
Stanford  University 
Stanford,  California 

Prof.  Harold  Hotelllng     1 
Associate  Director 
Institute  of  Statistics 
Univ.  of  North  Carolina 
Chapel  Hill,  North  Carolina 


Prof.  I,  R,  Savage  1 

School  of  Business  Admin, 
University  of  Minnesota 
Minneapolis,  Minnesota 

Prof.  Oscar  Kempthorne      1 
Statistics  Laboratory 
Iowa  State  College 
Ames,  Iowa 

Dr.  Carl  F,  Kossack         1 
Statistics  Laboratory 
Engineering  Admin,  Building 
Purdue  University 
Lafayette,  Indiana 

Prof,  Gerald  J,  Lleberman   1 
Appl.  Math,  i   Stat,  Lab. 
Stanford  University 
Stanford,  California 

Prof,  William  G.  Madow      1 
Department  of  Statistics 
Stanford  University 
Stanford,  California  -- 

Prof,  J,  Neyman  1 

Department  of  Statistics 
Universit-y  of  California 
Berkeley  I4.,  California 

Prof.  Herbert  Robblns       1 
Math,  Statistics  Department 
Columbia  University 
New  York  27,  New  York 

Prof.  Murray  Rosenblatt     1 
Dept,  of  Mathematics 
Indiana  University 
Bloomington,  Indiana 

Prof,  L.  J.  Savage         1 
Statistical  Res,  Laboratory 
Chicago  University 
Chicago  37,  Illinois 

Prof,  Frank  Spltzer         1 
Dept.  of  Mathematics 
University  of  Minnesota 
Minneapolis,  Minnesota 

Prof,  S.  S.  Wilks  1 

Dept.  of  Mathematics 
Princeton  University 
Princeton,  New  Jersey 


BASIC  DISTRIBUTION  LIST  FOR  UWCL/.SSIPIED  TECHNICAL  REPORTS  (cont.) 

No.  of 
Address copies 

Prof.  Gertrude  Cox         1 
Institute  of  Statistics 
State  College  Section 
North  Carolina  State  College 
Raleigh,  North  Carolina 

Prof,  J.  V/olfov;itz         1 
Department  of  Mathematics 
Cornell  University 
Ithaca,  Nev;  York 

Prof,  Harvey  11,  Wagner     , 
Stanford  University 
Applied  Matheriatics 

and  Statistics  Labs, 
Serra  House 
Stanford,  California 

Prof.  I'/.  H.  Kruskal        1 
Department  of  Statistics 
University  of  Chicago 
ChJ.cago  37,  Illinois 


{  •  ^'^O 


.L 


DATE  DUE 

***  ^  -w 

GAYLORD 

PRINTEOINU^.A. 

s  0 ;: 


IMM-267 


c.l 


Blrnbavmi 
On   hViR    foundations    of    stat. 

I-.267  ,.1 


c.l 


Birnbaiom 


J       AUTHI 

n    On    the  foundations   of  stat 

TITLE  ^  "  " '^-*- 

-iJAIi-£erence„_XuEin£ 


BORROWERS    NAME 


^,\ff 


>»* 


ikXT- 


ROOM 
NUMBER 


k  V  -  -  -,         — ^ 


^W-wi  '•■  ^..■-ifli>'23#«r' J! 


N.  Y.  U.  Institute  of 
Mathematical  Sciences 

25  Waverly  Place 
New  York  3,  N.  Y. 


